From facklerpw at ornl.gov  Mon Oct  2 09:40:28 2023
From: facklerpw at ornl.gov (Fackler, Philip)
Date: Mon, 2 Oct 2023 14:40:28 +0000
Subject: [petsc-users] Unexpected performance losses switching to COO
 interface
Message-ID: <SA1PR09MB80772E375B8C0C24CCE0BD85C6C5A@SA1PR09MB8077.namprd09.prod.outlook.com>

We finally have xolotl ported to use the new COO interface and the aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port to our previous version (using MatSetValuesStencil and the default Mat and Vec implementations), we expected to see an improvement in performance for both the "serial" and "cuda" builds (here I'm referring to the kokkos configuration).

Attached are two plots that show timings for three different cases. All of these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on a single node). The CUDA cases were given one GPU per task (and used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible.

The performance of RHSJacobian (where the bulk of computation happens in xolotl) behaved basically as expected (better than expected in the serial build). NE_3 case in CUDA was the only one that performed worse, but not surprisingly, since its workload for the GPUs is much smaller. We've still got more optimization to do on this.

The real surprise was how much worse the overall solve times were. This seems to be due simply to switching to the kokkos-based implementation. I'm wondering if there are any changes we can make in configuration or runtime arguments to help with PETSc's performance here. Any help looking into this would be appreciated.

The tarballs linked here<https://drive.google.com/file/d/19X_L3SVkGBM9YUzXnRR_kVWFG0JFwqZ3/view?usp=drive_link> and here<https://drive.google.com/file/d/15yDBN7-YlO1g6RJNPYNImzr611i1Ffhv/view?usp=drive_link> are profiling databases which, once extracted, can be viewed with hpcviewer. I don't know how helpful that will be, but hopefully it can give you some direction.

Thanks for your help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231002/118b6a0f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TotalSolve.png
Type: image/png
Size: 15036 bytes
Desc: TotalSolve.png
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231002/118b6a0f/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RHSJacobian.png
Type: image/png
Size: 16082 bytes
Desc: RHSJacobian.png
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231002/118b6a0f/attachment-0003.png>

From junchao.zhang at gmail.com  Mon Oct  2 09:52:41 2023
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Mon, 2 Oct 2023 09:52:41 -0500
Subject: [petsc-users] Unexpected performance losses switching to COO
 interface
In-Reply-To: <SA1PR09MB80772E375B8C0C24CCE0BD85C6C5A@SA1PR09MB8077.namprd09.prod.outlook.com>
References: <SA1PR09MB80772E375B8C0C24CCE0BD85C6C5A@SA1PR09MB8077.namprd09.prod.outlook.com>
Message-ID: <CA+MQGp-NBG=CzGEZ-8PVCa3sUiG2erRCKUJG+rP-3cLcN+cVFg@mail.gmail.com>

Hi, Philip,
  I will look into the tarballs and get back to you.
   Thanks.
--Junchao Zhang


On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> We finally have xolotl ported to use the new COO interface and the
> aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port
> to our previous version (using MatSetValuesStencil and the default Mat and
> Vec implementations), we expected to see an improvement in performance for
> both the "serial" and "cuda" builds (here I'm referring to the kokkos
> configuration).
>
> Attached are two plots that show timings for three different cases. All of
> these were run on Ascent (the Summit-like training system) with 6 MPI tasks
> (on a single node). The CUDA cases were given one GPU per task (and used
> CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases
> we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent
> as possible.
>
> The performance of RHSJacobian (where the bulk of computation happens in
> xolotl) behaved basically as expected (better than expected in the serial
> build). NE_3 case in CUDA was the only one that performed worse, but not
> surprisingly, since its workload for the GPUs is much smaller. We've still
> got more optimization to do on this.
>
> The real surprise was how much worse the overall solve times were. This
> seems to be due simply to switching to the kokkos-based implementation. I'm
> wondering if there are any changes we can make in configuration or runtime
> arguments to help with PETSc's performance here. Any help looking into this
> would be appreciated.
>
> The tarballs linked here
> <https://drive.google.com/file/d/19X_L3SVkGBM9YUzXnRR_kVWFG0JFwqZ3/view?usp=drive_link>
> and here
> <https://drive.google.com/file/d/15yDBN7-YlO1g6RJNPYNImzr611i1Ffhv/view?usp=drive_link>
> are profiling databases which, once extracted, can be viewed with
> hpcviewer. I don't know how helpful that will be, but hopefully it can give
> you some direction.
>
> Thanks for your help,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231002/48f455d1/attachment.html>

From thanasis.boutsikakis at corintis.com  Tue Oct  3 05:05:22 2023
From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis)
Date: Tue, 3 Oct 2023 12:05:22 +0200
Subject: [petsc-users] Concatenation of local-to-global matrix
Message-ID: <83F9C3F4-CA98-45F1-ADBB-EB58588B3AC0@corintis.com>

I am trying to multiply a sequential PETsc matrix with an mpi PETSc matrix in parallel. The final step is to concatenate the product matrix, which is a local sequential PETSc matrix that is different for every proc, so that I get the full mpi matrix as a result. This has proven to work, but setting the values one by one using a loop is very inefficient and slow.

In the following MFE, I am trying to make this concatenation more efficient by setting the values in batches. However it doesn?t work and I am wondering why:

"""Experimenting with PETSc mat-mat multiplication"""

import time

import numpy as np
from colorama import Fore
from firedrake import COMM_SELF, COMM_WORLD
from firedrake.petsc import PETSc
from mpi4py import MPI
from numpy.testing import assert_array_almost_equal

from utilities import Print

size = COMM_WORLD.size
rank = COMM_WORLD.rank

def create_petsc_matrix(input_array, sparse=True):
    """Create a PETSc matrix from an input_array

    Args:
        input_array (np array): Input array
        partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
        sparse (bool, optional): Toggle for sparese or dense. Defaults to True.

    Returns:
        PETSc mat: PETSc matrix
    """
    # Check if input_array is 1D and reshape if necessary
    assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
    global_rows, global_cols = input_array.shape

    size = ((None, global_rows), (global_cols, global_cols))

    # Create a sparse or dense matrix based on the 'sparse' argument
    if sparse:
        matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
    else:
        matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
    matrix.setUp()

    local_rows_start, local_rows_end = matrix.getOwnershipRange()

    for counter, i in enumerate(range(local_rows_start, local_rows_end)):
        # Calculate the correct row in the array for the current process
        row_in_array = counter + local_rows_start
        matrix.setValues(
            i, range(global_cols), input_array[row_in_array, :], addv=False
        )

    # Assembly the matrix to compute the final structure
    matrix.assemblyBegin()
    matrix.assemblyEnd()

    return matrix

def get_local_submatrix(A):
    """Get the local submatrix of A

    Args:
        A (mpi PETSc mat): partitioned PETSc matrix

    Returns:
        seq mat: PETSc matrix
    """
    local_rows_start, local_rows_end = A.getOwnershipRange()
    local_rows = local_rows_end - local_rows_start
    comm = A.getComm()
    rows = PETSc.IS().createStride(
        local_rows, first=local_rows_start, step=1, comm=comm
    )
    _, k = A.getSize()  # Get the number of columns (k) from A's size
    cols = PETSc.IS().createStride(k, first=0, step=1, comm=comm)

    # Getting the local submatrix
    # TODO: To be replaced by MatMPIAIJGetLocalMat() in the future (see petsc-users mailing list). There is a missing petsc4py binding, need to add it myself (and please create a merge request)
    A_local = A.createSubMatrices(rows, cols)[0]
    return A_local


def create_petsc_matrix_seq(input_array):
    """Building a sequential PETSc matrix from an array

    Args:
        input_array (np array): Input array

    Returns:
        seq mat: PETSc matrix
    """
    assert len(input_array.shape) == 2

    m, n = input_array.shape
    matrix = PETSc.Mat().createAIJ(size=(m, n), comm=COMM_SELF)
    matrix.setUp()

    matrix.setValues(range(m), range(n), input_array, addv=False)

    # Assembly the matrix to compute the final structure
    matrix.assemblyBegin()
    matrix.assemblyEnd()

    return matrix


def multiply_matrices_seq(A_seq, B_seq):
    """Multiply 2 sequential matrices

    Args:
        A_seq (seqaij): local submatrix of A
        B_seq (seqaij): sequential matrix B

    Returns:
        seq mat: PETSc matrix that is the product of A_seq and B_seq
    """
    _, A_seq_cols = A_seq.getSize()
    B_seq_rows, _ = B_seq.getSize()
    assert (
        A_seq_cols == B_seq_rows
    ), f"Incompatible matrix sizes for multiplication: {A_seq_cols} != {B_seq_rows}"
    C_local = A_seq.matMult(B_seq)
    return C_local


def concatenate_local_to_global_matrix(local_matrix, mat_type=None):
    """Create the global matrix C from the local submatrix local_matrix

    Args:
        local_matrix (seqaij): local submatrix of global_matrix
        partition_like (mpiaij): partitioned PETSc matrix
        mat_type (str): type of the global matrix. Defaults to None. If None, the type of local_matrix is used.

    Returns:
        mpi PETSc mat: partitioned PETSc matrix
    """
    local_matrix_rows, local_matrix_cols = local_matrix.getSize()
    global_rows = COMM_WORLD.allreduce(local_matrix_rows, op=MPI.SUM)

    # Determine the local portion of the vector
    size = ((None, global_rows), (local_matrix_cols, local_matrix_cols))

    if mat_type is None:
        mat_type = local_matrix.getType()

    if "dense" in mat_type:
        sparse = False
    else:
        sparse = True

    if sparse:
        global_matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
    else:
        global_matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
    global_matrix.setUp()

    # The exscan operation is used to get the starting global row for each process.
    # The result of the exclusive scan is the sum of the local rows from previous ranks.
    global_row_start = COMM_WORLD.exscan(local_matrix_rows, op=MPI.SUM)
    if rank == 0:
        global_row_start = 0

    concatenate_start = time.time()

    # This works but is very inefficient
    # for i in range(local_matrix_rows):
    #     cols, values = local_matrix.getRow(i)
    #     global_row = i + global_row_start
    #     global_matrix.setValues(global_row, cols, values)

    all_cols = []
    all_values = []
    all_global_rows = [i + global_row_start for i in range(local_matrix_rows)]

    for i in range(len(all_global_rows)):
        cols, values = local_matrix.getRow(i)
        # print(f"cols: {cols}, values: {values}")
        all_cols.append(cols)
        all_values.append(values)

    for j in range(local_matrix_cols):
        columns = [all_cols[i][j] for i in range(len(all_cols))]
        values = [all_values[i][j] for i in range(len(all_values))]

        global_matrix.setValues(all_global_rows, columns, values)

    concatenate_end = time.time()
    Print(
        f"  -Setting values: {concatenate_end - concatenate_start: 2.2f} s",
        Fore.GREEN,
    )

    global_matrix.assemblyBegin()
    global_matrix.assemblyEnd()

    return global_matrix


# --------------------------------------------
# EXP: Multiplication of an mpi PETSc matrix with a sequential PETSc matrix
#  C = A * B
# [m x k] = [m x k] * [k x k]
# --------------------------------------------

m, k = 11, 7
# Generate the random numpy matrices
np.random.seed(0)  # sets the seed to 0
A_np = np.random.randint(low=0, high=6, size=(m, k))
B_np = np.random.randint(low=0, high=6, size=(k, k))

# Create B as a sequential matrix on each process
B_seq = create_petsc_matrix_seq(B_np)

A = create_petsc_matrix(A_np)

# Getting the correct local submatrix to be multiplied by B_seq
A_local = get_local_submatrix(A)

# Multiplication of 2 sequential matrices
C_local = multiply_matrices_seq(A_local, B_seq)

# Creating the global C matrix
C = concatenate_local_to_global_matrix(C_local) if size > 1 else C_local

# --------------------------------------------
# TEST: Multiplication of 2 numpy matrices
# --------------------------------------------
AB_np = np.dot(A_np, B_np)
Print(f"MATRIX AB_np [{AB_np.shape[0]}x{AB_np.shape[1]}]")
Print(f"{AB_np}")

# Get the local values from C
local_rows_start, local_rows_end = C.getOwnershipRange()
C_local = C.getValues(range(local_rows_start, local_rows_end), range(k))

# Assert the correctness of the multiplication for the local subset
assert_array_almost_equal(C_local, AB_np[local_rows_start:local_rows_end, :], decimal=5)


Any idea how to fix this?

Thanks,
Thanos

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231003/3aa234ef/attachment-0001.html>

From bsmith at petsc.dev  Tue Oct  3 08:07:36 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 3 Oct 2023 09:07:36 -0400
Subject: [petsc-users] Concatenation of local-to-global matrix
In-Reply-To: <83F9C3F4-CA98-45F1-ADBB-EB58588B3AC0@corintis.com>
References: <83F9C3F4-CA98-45F1-ADBB-EB58588B3AC0@corintis.com>
Message-ID: <F559E349-F750-4A96-A8D8-DDEDB4F0CE54@petsc.dev>


   Take a look at MatCreateMPIMatConcatenateSeqMat_MPIAIJ() in src/mat/impls/aij/mpi/mpiaij.c  In that file you will find several routines similar to what you are building.

Note the preallocation:

    MatPreallocateBegin(comm, m, n, dnz, onz);
    for (i = 0; i < m; i++) {
      PetscCall(MatGetRow_SeqAIJ(inmat, i, &nnz, &indx, NULL));
      PetscCall(MatPreallocateSet(i + rstart, nnz, indx, dnz, onz));
      PetscCall(MatRestoreRow_SeqAIJ(inmat, i, &nnz, &indx, NULL));
    }
...
   PetscCall(MatSeqAIJSetPreallocation(*outmat, 0, dnz));
    PetscCall(MatMPIAIJSetPreallocation(*outmat, 0, dnz, 0, onz));


Probably best to reuse the C code than have slower Python code.


> On Oct 3, 2023, at 6:05 AM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
> 
> I am trying to multiply a sequential PETsc matrix with an mpi PETSc matrix in parallel. The final step is to concatenate the product matrix, which is a local sequential PETSc matrix that is different for every proc, so that I get the full mpi matrix as a result. This has proven to work, but setting the values one by one using a loop is very inefficient and slow.
> 
> In the following MFE, I am trying to make this concatenation more efficient by setting the values in batches. However it doesn?t work and I am wondering why:
> 
> """Experimenting with PETSc mat-mat multiplication"""
> 
> import time
> 
> import numpy as np
> from colorama import Fore
> from firedrake import COMM_SELF, COMM_WORLD
> from firedrake.petsc import PETSc
> from mpi4py import MPI
> from numpy.testing import assert_array_almost_equal
> 
> from utilities import Print
> 
> size = COMM_WORLD.size
> rank = COMM_WORLD.rank
> 
> def create_petsc_matrix(input_array, sparse=True):
>     """Create a PETSc matrix from an input_array
> 
>     Args:
>         input_array (np array): Input array
>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
> 
>     Returns:
>         PETSc mat: PETSc matrix
>     """
>     # Check if input_array is 1D and reshape if necessary
>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>     global_rows, global_cols = input_array.shape
> 
>     size = ((None, global_rows), (global_cols, global_cols))
> 
>     # Create a sparse or dense matrix based on the 'sparse' argument
>     if sparse:
>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>     else:
>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>     matrix.setUp()
> 
>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
> 
>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>         # Calculate the correct row in the array for the current process
>         row_in_array = counter + local_rows_start
>         matrix.setValues(
>             i, range(global_cols), input_array[row_in_array, :], addv=False
>         )
> 
>     # Assembly the matrix to compute the final structure
>     matrix.assemblyBegin()
>     matrix.assemblyEnd()
> 
>     return matrix
> 
> def get_local_submatrix(A):
>     """Get the local submatrix of A
> 
>     Args:
>         A (mpi PETSc mat): partitioned PETSc matrix
> 
>     Returns:
>         seq mat: PETSc matrix
>     """
>     local_rows_start, local_rows_end = A.getOwnershipRange()
>     local_rows = local_rows_end - local_rows_start
>     comm = A.getComm()
>     rows = PETSc.IS().createStride(
>         local_rows, first=local_rows_start, step=1, comm=comm
>     )
>     _, k = A.getSize()  # Get the number of columns (k) from A's size
>     cols = PETSc.IS().createStride(k, first=0, step=1, comm=comm)
> 
>     # Getting the local submatrix
>     # TODO: To be replaced by MatMPIAIJGetLocalMat() in the future (see petsc-users mailing list). There is a missing petsc4py binding, need to add it myself (and please create a merge request)
>     A_local = A.createSubMatrices(rows, cols)[0]
>     return A_local
> 
> 
> def create_petsc_matrix_seq(input_array):
>     """Building a sequential PETSc matrix from an array
> 
>     Args:
>         input_array (np array): Input array
> 
>     Returns:
>         seq mat: PETSc matrix
>     """
>     assert len(input_array.shape) == 2
> 
>     m, n = input_array.shape
>     matrix = PETSc.Mat().createAIJ(size=(m, n), comm=COMM_SELF)
>     matrix.setUp()
> 
>     matrix.setValues(range(m), range(n), input_array, addv=False)
> 
>     # Assembly the matrix to compute the final structure
>     matrix.assemblyBegin()
>     matrix.assemblyEnd()
> 
>     return matrix
> 
> 
> def multiply_matrices_seq(A_seq, B_seq):
>     """Multiply 2 sequential matrices
> 
>     Args:
>         A_seq (seqaij): local submatrix of A
>         B_seq (seqaij): sequential matrix B
> 
>     Returns:
>         seq mat: PETSc matrix that is the product of A_seq and B_seq
>     """
>     _, A_seq_cols = A_seq.getSize()
>     B_seq_rows, _ = B_seq.getSize()
>     assert (
>         A_seq_cols == B_seq_rows
>     ), f"Incompatible matrix sizes for multiplication: {A_seq_cols} != {B_seq_rows}"
>     C_local = A_seq.matMult(B_seq)
>     return C_local
> 
> 
> def concatenate_local_to_global_matrix(local_matrix, mat_type=None):
>     """Create the global matrix C from the local submatrix local_matrix
> 
>     Args:
>         local_matrix (seqaij): local submatrix of global_matrix
>         partition_like (mpiaij): partitioned PETSc matrix
>         mat_type (str): type of the global matrix. Defaults to None. If None, the type of local_matrix is used.
> 
>     Returns:
>         mpi PETSc mat: partitioned PETSc matrix
>     """
>     local_matrix_rows, local_matrix_cols = local_matrix.getSize()
>     global_rows = COMM_WORLD.allreduce(local_matrix_rows, op=MPI.SUM)
> 
>     # Determine the local portion of the vector
>     size = ((None, global_rows), (local_matrix_cols, local_matrix_cols))
> 
>     if mat_type is None:
>         mat_type = local_matrix.getType()
> 
>     if "dense" in mat_type:
>         sparse = False
>     else:
>         sparse = True
> 
>     if sparse:
>         global_matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>     else:
>         global_matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>     global_matrix.setUp()
> 
>     # The exscan operation is used to get the starting global row for each process.
>     # The result of the exclusive scan is the sum of the local rows from previous ranks.
>     global_row_start = COMM_WORLD.exscan(local_matrix_rows, op=MPI.SUM)
>     if rank == 0:
>         global_row_start = 0
> 
>     concatenate_start = time.time()
> 
>     # This works but is very inefficient
>     # for i in range(local_matrix_rows):
>     #     cols, values = local_matrix.getRow(i)
>     #     global_row = i + global_row_start
>     #     global_matrix.setValues(global_row, cols, values)
> 
>     all_cols = []
>     all_values = []
>     all_global_rows = [i + global_row_start for i in range(local_matrix_rows)]
> 
>     for i in range(len(all_global_rows)):
>         cols, values = local_matrix.getRow(i)
>         # print(f"cols: {cols}, values: {values}")
>         all_cols.append(cols)
>         all_values.append(values)
> 
>     for j in range(local_matrix_cols):
>         columns = [all_cols[i][j] for i in range(len(all_cols))]
>         values = [all_values[i][j] for i in range(len(all_values))]
> 
>         global_matrix.setValues(all_global_rows, columns, values)
> 
>     concatenate_end = time.time()
>     Print(
>         f"  -Setting values: {concatenate_end - concatenate_start: 2.2f} s",
>         Fore.GREEN,
>     )
> 
>     global_matrix.assemblyBegin()
>     global_matrix.assemblyEnd()
> 
>     return global_matrix
> 
> 
> # --------------------------------------------
> # EXP: Multiplication of an mpi PETSc matrix with a sequential PETSc matrix
> #  C = A * B
> # [m x k] = [m x k] * [k x k]
> # --------------------------------------------
> 
> m, k = 11, 7
> # Generate the random numpy matrices
> np.random.seed(0)  # sets the seed to 0
> A_np = np.random.randint(low=0, high=6, size=(m, k))
> B_np = np.random.randint(low=0, high=6, size=(k, k))
> 
> # Create B as a sequential matrix on each process
> B_seq = create_petsc_matrix_seq(B_np)
> 
> A = create_petsc_matrix(A_np)
> 
> # Getting the correct local submatrix to be multiplied by B_seq
> A_local = get_local_submatrix(A)
> 
> # Multiplication of 2 sequential matrices
> C_local = multiply_matrices_seq(A_local, B_seq)
> 
> # Creating the global C matrix
> C = concatenate_local_to_global_matrix(C_local) if size > 1 else C_local
> 
> # --------------------------------------------
> # TEST: Multiplication of 2 numpy matrices
> # --------------------------------------------
> AB_np = np.dot(A_np, B_np)
> Print(f"MATRIX AB_np [{AB_np.shape[0]}x{AB_np.shape[1]}]")
> Print(f"{AB_np}")
> 
> # Get the local values from C
> local_rows_start, local_rows_end = C.getOwnershipRange()
> C_local = C.getValues(range(local_rows_start, local_rows_end), range(k))
> 
> # Assert the correctness of the multiplication for the local subset
> assert_array_almost_equal(C_local, AB_np[local_rows_start:local_rows_end, :], decimal=5)
> 
> 
> 
> Any idea how to fix this?
> 
> Thanks,
> Thanos
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231003/f3e4bd1e/attachment-0001.html>

From gongding at cn.cogenda.com  Tue Oct  3 12:51:38 2023
From: gongding at cn.cogenda.com (Gong Ding)
Date: Wed, 4 Oct 2023 01:51:38 +0800
Subject: [petsc-users] How to do a precondition in SNES flow
Message-ID: <f4d42bb7-e6ad-c421-7118-3a180d4624fe@cn.cogenda.com>

Hi all,

I'd like to do a? special jacobian precondition during the snes 
iteration, for which jacobian matrix and RHS vector must be modified 
explicitly.

In the SNESComputeJacobian, the preconditioner P is built after assembly 
of jacobian matrix.

I need to multiply P to J and RHS vector? explicitly as left 
precondition before the solve stage of J*dx = rhs.

However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction.

As a result,  I must find a hook function after SNESComputeJacobian and before the solve stage.

Any suggest?

Gong Ding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231004/dea30513/attachment.html>

From bsmith at petsc.dev  Tue Oct  3 13:13:32 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 3 Oct 2023 14:13:32 -0400
Subject: [petsc-users] How to do a precondition in SNES flow
In-Reply-To: <f4d42bb7-e6ad-c421-7118-3a180d4624fe@cn.cogenda.com>
References: <f4d42bb7-e6ad-c421-7118-3a180d4624fe@cn.cogenda.com>
Message-ID: <7A43650F-8584-4E0B-8689-68F73CA35C01@petsc.dev>


  Simply evaluate the Jacobian during your SNESComputeFunction and save it for SNESComputeJacobian.


> On Oct 3, 2023, at 1:51 PM, Gong Ding <gongding at cn.cogenda.com> wrote:
> 
> Hi all,
> 
> I'd like to do a  special jacobian precondition during the snes iteration, for which jacobian matrix and RHS vector must be modified explicitly.
> 
> In the SNESComputeJacobian, the preconditioner P is built after assembly of jacobian matrix.
> 
> I need to multiply P to J and RHS vector  explicitly as left precondition before the solve stage of J*dx = rhs. 
> 
> However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction.
> 
> As a result,  I must find a hook function after SNESComputeJacobian and before the solve stage. 
> 
> Any suggest? 
> 
> Gong Ding

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231003/b43377d7/attachment.html>

From gongding at cn.cogenda.com  Tue Oct  3 13:44:31 2023
From: gongding at cn.cogenda.com (Gong Ding)
Date: Wed, 4 Oct 2023 02:44:31 +0800
Subject: [petsc-users] How to do a precondition in SNES flow
In-Reply-To: <7A43650F-8584-4E0B-8689-68F73CA35C01@petsc.dev>
References: <f4d42bb7-e6ad-c421-7118-3a180d4624fe@cn.cogenda.com>
	<7A43650F-8584-4E0B-8689-68F73CA35C01@petsc.dev>
Message-ID: <acb85865-fd62-fe52-1c60-045c534fbd4f@cn.cogenda.com>

Any better choice if I can do right precondition?

Merge Jacobian to Function evaluation has performance lost.

On 2023/10/4 02:13, Barry Smith wrote:
>
> ? Simply evaluate the Jacobian during your SNESComputeFunction and 
> save it for SNESComputeJacobian.
>
>
>
>> On Oct 3, 2023, at 1:51 PM, Gong Ding <gongding at cn.cogenda.com> wrote:
>>
>> Hi all,
>>
>> I'd like to do a? special jacobian precondition during the snes 
>> iteration, for which jacobian matrix and RHS vector must be modified 
>> explicitly.
>>
>> In the SNESComputeJacobian, the preconditioner P is built after 
>> assembly of jacobian matrix.
>>
>> I need to multiply P to J and RHS vector? explicitly as left 
>> precondition before the solve stage of J*dx = rhs.
>>
>> However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction.
>>
>> As a result,  I must find a hook function after SNESComputeJacobian and before the solve stage.
>>
>> Any suggest?
>>
>> Gong Ding
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231004/c9dcaf55/attachment.html>

From knepley at gmail.com  Tue Oct  3 13:47:49 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 3 Oct 2023 14:47:49 -0400
Subject: [petsc-users] How to do a precondition in SNES flow
In-Reply-To: <f4d42bb7-e6ad-c421-7118-3a180d4624fe@cn.cogenda.com>
References: <f4d42bb7-e6ad-c421-7118-3a180d4624fe@cn.cogenda.com>
Message-ID: <CAMYG4GnsoSn6NZj7MnYmwpMDZW8rZMJuYsxJqTyPkoSyV-Sxpg@mail.gmail.com>

On Tue, Oct 3, 2023 at 1:51?PM Gong Ding <gongding at cn.cogenda.com> wrote:

> Hi all,
>
> I'd like to do a  special jacobian precondition during the snes iteration,
> for which jacobian matrix and RHS vector must be modified explicitly.
>
> In the SNESComputeJacobian, the preconditioner P is built after assembly
> of jacobian matrix.
>
> I need to multiply P to J and RHS vector  explicitly as left precondition
> before the solve stage of J*dx = rhs.
>
> What you are proposing is exactly what PETSc does with left
preconditioning, multiplies both sides by the preconditioner. What do you
want to change?

  Thanks,

     Matt

> However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction.
>
> As a result,  I must find a hook function after SNESComputeJacobian and before the solve stage.
>
> Any suggest?
>
> Gong Ding
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231003/53b77de7/attachment-0001.html>

From bsmith at petsc.dev  Tue Oct  3 13:57:09 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 3 Oct 2023 14:57:09 -0400
Subject: [petsc-users] How to do a precondition in SNES flow
In-Reply-To: <acb85865-fd62-fe52-1c60-045c534fbd4f@cn.cogenda.com>
References: <f4d42bb7-e6ad-c421-7118-3a180d4624fe@cn.cogenda.com>
	<7A43650F-8584-4E0B-8689-68F73CA35C01@petsc.dev>
	<acb85865-fd62-fe52-1c60-045c534fbd4f@cn.cogenda.com>
Message-ID: <F1A007C6-819A-4C3B-92F1-ACA48A5E1722@petsc.dev>


> On Oct 3, 2023, at 2:44 PM, Gong Ding <gongding at cn.cogenda.com> wrote:
> 
> Any better choice if I can do right precondition?
> 
> Merge Jacobian to Function evaluation has performance lost.  
> 

  Why ? Should still just compute the Jacobian once, just earlier in the process.

> On 2023/10/4 02:13, Barry Smith wrote:
>> 
>>   Simply evaluate the Jacobian during your SNESComputeFunction and save it for SNESComputeJacobian.
>> 
>> 
>> 
>>> On Oct 3, 2023, at 1:51 PM, Gong Ding <gongding at cn.cogenda.com> <mailto:gongding at cn.cogenda.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> I'd like to do a  special jacobian precondition during the snes iteration, for which jacobian matrix and RHS vector must be modified explicitly.
>>> 
>>> In the SNESComputeJacobian, the preconditioner P is built after assembly of jacobian matrix.
>>> 
>>> I need to multiply P to J and RHS vector  explicitly as left precondition before the solve stage of J*dx = rhs. 
>>> 
>>> However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction.
>>> 
>>> As a result,  I must find a hook function after SNESComputeJacobian and before the solve stage. 
>>> 
>>> Any suggest? 
>>> 
>>> Gong Ding
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231003/85b83a20/attachment.html>

From gongding at cn.cogenda.com  Tue Oct  3 14:04:50 2023
From: gongding at cn.cogenda.com (Gong Ding)
Date: Wed, 4 Oct 2023 03:04:50 +0800
Subject: [petsc-users] How to do a precondition in SNES flow
In-Reply-To: <CAMYG4GnsoSn6NZj7MnYmwpMDZW8rZMJuYsxJqTyPkoSyV-Sxpg@mail.gmail.com>
References: <f4d42bb7-e6ad-c421-7118-3a180d4624fe@cn.cogenda.com>
	<CAMYG4GnsoSn6NZj7MnYmwpMDZW8rZMJuYsxJqTyPkoSyV-Sxpg@mail.gmail.com>
Message-ID: <739f4aec-9da4-623d-8d48-973eeab4193e@cn.cogenda.com>


On 2023/10/4 02:47, Matthew Knepley wrote:
> On Tue, Oct 3, 2023 at 1:51?PM Gong Ding <gongding at cn.cogenda.com> wrote:
>
>     Hi all,
>
>     I'd like to do a? special jacobian precondition during the snes
>     iteration, for which jacobian matrix and RHS vector must be
>     modified explicitly.
>
>     In the SNESComputeJacobian, the preconditioner P is built after
>     assembly of jacobian matrix.
>
>     I need to multiply P to J and RHS vector? explicitly as left
>     precondition before the solve stage of J*dx = rhs.
>
> What you are proposing is exactly what PETSc does with left 
> preconditioning, multiplies both sides by the preconditioner. What do 
> you want to change?

I'd like to multiply precondition matrix into jacobian matrix, and do LU 
factorization to jacobian matrix. not with iterative method. Something like

Kelley, C. T. "Newton's Method in Three Precisions." /arXiv preprint 
arXiv:2307.16051/ (2023).

BTW: does petsc have the plan to support multi-precision?


>
> ? Thanks,
>
> ? ? ?Matt
>
>     However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction.
>
>     As a result,  I must find a hook function after SNESComputeJacobian and before the solve stage.
>
>     Any suggest?
>
>     Gong Ding
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231004/62b1e36e/attachment.html>

From knepley at gmail.com  Tue Oct  3 14:32:27 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 3 Oct 2023 15:32:27 -0400
Subject: [petsc-users] How to do a precondition in SNES flow
In-Reply-To: <739f4aec-9da4-623d-8d48-973eeab4193e@cn.cogenda.com>
References: <f4d42bb7-e6ad-c421-7118-3a180d4624fe@cn.cogenda.com>
	<CAMYG4GnsoSn6NZj7MnYmwpMDZW8rZMJuYsxJqTyPkoSyV-Sxpg@mail.gmail.com>
	<739f4aec-9da4-623d-8d48-973eeab4193e@cn.cogenda.com>
Message-ID: <CAMYG4G=PYtWWH83m3mYoXH-uigScjWDK_tFUDp49o5Yt76dndA@mail.gmail.com>

On Tue, Oct 3, 2023 at 3:05?PM Gong Ding <gongding at cn.cogenda.com> wrote:

> On 2023/10/4 02:47, Matthew Knepley wrote:
>
> On Tue, Oct 3, 2023 at 1:51?PM Gong Ding <gongding at cn.cogenda.com> wrote:
>
>> Hi all,
>>
>> I'd like to do a  special jacobian precondition during the snes
>> iteration, for which jacobian matrix and RHS vector must be modified
>> explicitly.
>>
>> In the SNESComputeJacobian, the preconditioner P is built after assembly
>> of jacobian matrix.
>>
>> I need to multiply P to J and RHS vector  explicitly as left precondition
>> before the solve stage of J*dx = rhs.
>>
>> What you are proposing is exactly what PETSc does with left
> preconditioning, multiplies both sides by the preconditioner. What do you
> want to change?
>
> I'd like to multiply precondition matrix into jacobian matrix, and do LU
> factorization to jacobian matrix. not with iterative method. Something like
>
> Kelley, C. T. "Newton's Method in Three Precisions." *arXiv preprint
> arXiv:2307.16051* (2023).
>
> BTW: does petsc have the plan to support multi-precision?
>
1. Tim is just solving the Newton equation with LU. You can do this using

  -pc_type lu

2. We do not support this kind of multi-precision. We had a plan to do
this, but no one to work on it. It does not seem to be a priority of users
so far.

  Thanks,

     Matt


>   Thanks,
>
>      Matt
>
>> However, I find that petsc evaluates function before jacobian, so P*RHS vector can not be processed at SNESComputeFunction.
>>
>> As a result,  I must find a hook function after SNESComputeJacobian and before the solve stage.
>>
>> Any suggest?
>>
>> Gong Ding
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231003/47147319/attachment-0001.html>

From gsosajones at oakland.edu  Tue Oct  3 14:55:01 2023
From: gsosajones at oakland.edu (Giselle Sosa Jones)
Date: Tue, 3 Oct 2023 15:55:01 -0400
Subject: [petsc-users] Scalapack issue
Message-ID: <CA+0aKaw0gtOq9WgeGx0XHCSB58FdR7mtOgZ3YDDcrjhT9Y3jSA@mail.gmail.com>

Hello,

I have a Mac with M1 chip and I struggled a lot to install PETSc on it. I
did it eventually (thanks to your help), but with the latest MacOS update,
things stopped working. I am trying to configure the latest version of
PETSc, and I have the following error popping up:

Cannot use scalapack without Fortran, make sure you do NOT have --with-fc=0

I have gfortran installed with brew. I am going to send my configure.log
file to the other mailing list.

Thank you for your help in advance.

Best,
Giselle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231003/ee653467/attachment.html>

From s.kramer at imperial.ac.uk  Tue Oct  3 23:30:38 2023
From: s.kramer at imperial.ac.uk (Stephan Kramer)
Date: Wed, 4 Oct 2023 15:30:38 +1100
Subject: [petsc-users] performance regression with GAMG
In-Reply-To: <CADOhEh7my6wfisDOiwOP3zniLZfkzJVvQG=dgfrSrBckbxw3Pw@mail.gmail.com>
References: <da49b029-a648-7ae8-af8e-a11761b4cf73@imperial.ac.uk>
	<CADOhEh4FFGT0cXxsQpvWttkOt4Vfrtm=yBeqpAtAcCWa17V7qA@mail.gmail.com>
	<9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk>
	<CADOhEh72DsiA5CHKQ15ec6EKfW9a-ST7U_xEoZ2_LO2taNmEqA@mail.gmail.com>
	<CADOhEh4r7z2-hV4F8oSQ-JvsAp9+U_j6UkyakAjuCwLDdjCb_w@mail.gmail.com>
	<CADOhEh6SSjUQR2G_JivHe576KyNpea-=w9UPKpj5QVWhdPbaMg@mail.gmail.com>
	<99896e04-7ac2-9e92-0922-e78f2d0c710d@imperial.ac.uk>
	<CADOhEh7my6wfisDOiwOP3zniLZfkzJVvQG=dgfrSrBckbxw3Pw@mail.gmail.com>
Message-ID: <0b512a75-d6ae-8a3f-1478-970b700c008a@imperial.ac.uk>

Hi Mark

Thanks again for re-enabling the square graph aggressive coarsening 
option which seems to have restored performance for most of our cases. 
Unfortunately we do have a remaining issue, which only seems to occur 
for the larger mesh size ("level 7" which has 6,389,890 vertices and we 
normally run on 1536 cpus): we either get a "Petsc has generated 
inconsistent data" error, or a hang - both when constructing the square 
graph matrix. So this is with the new 
-pc_gamg_aggressive_square_graph=true option, without the option there's 
no error but of course we would get back to the worse performance.

Backtrace for the "inconsistent data" error. Note this is actually just 
petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge 
of adams/gamg-add-old-coarsening into main - with one unrelated commit 
from firedrake

[0]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------
[0]PETSC ERROR: Petsc has generated inconsistent data
[0]PETSC ERROR: j 8 not equal to expected number of sends 9
[0]PETSC ERROR: Petsc Development GIT revision: 
v3.4.2-43104-ga3b76b71a1? GIT Date: 2023-09-18 10:26:04 +0100
[0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a? named 
gadi-cpu-clx-0241.gadi.nci.org.au by sck551 Wed Oct? 4 14:30:41 2023
[0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix 
--with-make-np=4 --with-debugging=0 --with-shared-libraries=1 
--with-fortran-bindings=0 --with-zlib --with-c2html=0 
--with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx 
--with-fc=mpifort --download-hdf5 --download-hypre 
--download-superlu_dist --download-ptscotch --download-suitesparse 
--download-pastix --download-hwloc --download-metis --download-scalapack 
--download-mumps --download-chaco --download-ml 
CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441
[0]PETSC ERROR: #1 PetscGatherMessageLengths2() at 
/jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270
[0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at 
/jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867
[0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at 
/jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
[0]PETSC ERROR: #4 MatProductSymbolic() at 
/jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
[0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
[0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
[0]PETSC ERROR: #7 PCSetUp_GAMG() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
[0]PETSC ERROR: #8 PCSetUp() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
[0]PETSC ERROR: #9 PCApply() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
[0]PETSC ERROR: #10 PCApply() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
[0]PETSC ERROR: #11 KSP_PCApply() at 
/jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
[0]PETSC ERROR: #12 KSPSolve_CG() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162
[0]PETSC ERROR: #13 KSPSolve_Private() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
[0]PETSC ERROR: #14 KSPSolve() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
[0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175
[0]PETSC ERROR: #16 PCApply() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
[0]PETSC ERROR: #17 KSP_PCApply() at 
/jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
[0]PETSC ERROR: #18 KSPSolve_PREONLY() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25
[0]PETSC ERROR: #19 KSPSolve_Private() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
[0]PETSC ERROR: #20 KSPSolve() at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
[0]PETSC ERROR: #21 SNESSolve_KSPONLY() at 
/jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49
[0]PETSC ERROR: #22 SNESSolve() at 
/jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635

Last -info :pc messages:

[0] <pc:gamg> PCSetUp(): Setting up PC for first time
[0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0) 
N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536
[0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in 
graph (1.588710e+07 1.765233e+06)
[0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: 
Square Graph on level 1
[0] <pc:gamg> fixAggregatesWithSquare(): isMPI = yes
[0] <pc:gamg> PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_: 
New grid 380144 nodes
[0] <pc:gamg> PCGAMGOptProlongator_AGG(): 
Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00 
min=9.015236e-02 PC=jacobi
[0] <pc:gamg> PCGAMGOptProlongator_AGG(): 
Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra 
0.0901524 4.48938
[0] <pc:gamg> PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_: 
Coarse grid reduction from 1536 to 1536 active processes
[0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1) 
N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes
[0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges in 
graph (5.310360e+05 5.353000e+03)
[0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: 
Square Graph on level 2

The hang (on a slightly different model configuration but on the same 
mesh and n/o cores) seems to occur in the same location. If I use gdb to 
attach to the running processes, it seems on some cores it has somehow 
manages to fall out of the pcsetup and is waiting in the first norm 
calculation in the outside CG iteration:

#0? 0x000014cce9999119 in 
hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from 
/apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so
#1? 0x000014ccef2c2737 in _coll_ml_allreduce () from 
/apps/hcoll/4.7.3202/lib/libhcoll.so.1
#2? 0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1, 
rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 <ompi_mpi_double>, 
op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0, module=0x43a0110) 
at 
/jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228
#3? 0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1, 
recvbuf=<optimized out>, count=1, datatype=<optimized out>, 
op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0) at pallreduce.c:113
#4? 0x000014cd271c9889 in VecNorm_MPI_Default (xin=<optimized out>, 
type=<optimized out>, z=<optimized out>, VecNorm_SeqFn=<optimized out>) 
at 
/jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168
#5? VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at 
/jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39
#6? 0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648, 
val=0x22d) at 
/jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214
#7? 0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163
etc.

but with other cores still stuck at:

#0? 0x000015375cf41e8a in ucp_worker_progress () from 
/apps/ucx/1.12.0/lib/libucp.so.0
#1? 0x000015377d4bd57b in opal_progress () at 
/jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231
#2? 0x000015377d4c3ba5 in ompi_sync_wait_mt 
(sync=sync at entry=0x7ffd6aedf6f0) at 
/jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85
#3? 0x000015378bf7cf38 in ompi_request_default_wait_any (count=8, 
requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at 
/jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124
#4? 0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0, 
indx=0x7ffd6aedfa60, status=<optimized out>) at pwaitany.c:86
#5? 0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ 
(P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at 
/jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884
#6? 0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ 
(C=0x2cc7500) at 
/jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
#7? 0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at 
/jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
#8? 0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500, 
Gmat1=0x1, Gmat2=0xc0fe132c) at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
#9? 0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500, 
a_Gmat1=0x1, agg_lists=0xc0fe132c) at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
#10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
#11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
#12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
#13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply 
(__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524, 
__pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082
#14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS 
(func=0x15378f302890, args=0x83b3218, nargsf=<optimized out>, 
kwnames=<optimized out>) at ../Objects/descrobject.c:405
#15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0, 
nargsf=<optimized out>, args=0x83b3218, callable=0x15378f302890, 
tstate=0x23e0020) at ../Include/cpython/abstract.h:114
#16 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
args=0x83b3218, callable=0x15378f302890) at 
../Include/cpython/abstract.h:123
#17 call_function (kwnames=0x0, oparg=<optimized out>, 
pp_stack=<synthetic pointer>, trace_info=0x7ffd6aee0390, 
tstate=<optimized out>) at ../Python/ceval.c:5867
#18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, 
throwflag=<optimized out>) at ../Python/ceval.c:4198
#19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080, 
tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46
#20 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, 
locals=<optimized out>, args=<optimized out>, argcount=4, 
kwnames=<optimized out>) at ../Python/ceval.c:5065
#21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func=<optimized 
out>, args=0x1, _nargs=<optimized out>, kwargs=<optimized out>) at 
src/petsc4py/PETSc.c:548022
#22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500, 
__pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979
#23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
#24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1, 
y=0xc0fe132c) at 
/jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
#25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at 
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162

Let me know if there is anything further we can try to debug this issue

Kind regards
Stephan Kramer


On 02/09/2023 01:58, Mark Adams wrote:
> Fantastic!
>
> I fixed a memory free problem. You should be OK now.
> I am pretty sure you are good but I would like to wait to get any feedback
> from you.
> We should have a release at the end of the month and it would be nice to
> get this into it.
>
> Thanks,
> Mark
>
>
> On Fri, Sep 1, 2023 at 7:07?AM Stephan Kramer <s.kramer at imperial.ac.uk>
> wrote:
>
>> Hi Mark
>>
>> Sorry took a while to report back. We have tried your branch but hit a
>> few issues, some of which we're not entirely sure are related.
>>
>> First switching off minimum degree ordering, and then switching to the
>> old version of aggressive coarsening, as you suggested, got us back to
>> the coarsening behaviour that we had previously, but then we also
>> observed an even further worsening of the iteration count: it had
>> previously gone up by 50% already (with the newer main petsc), but now
>> was more than double "old" petsc. Took us a while to realize this was
>> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi.
>> Switching this also back to the old default we get back to very similar
>> coarsening levels (see below for more details if it is of interest) and
>> iteration counts.
>>
>> So that's all very good news. However, we were also starting seeing
>> memory errors (double free or corruption) when we switched off the
>> minimum degree ordering. Because this was at an earlier version of your
>> branch we then rebuild, hoping this was just an earlier bug that had
>> been fixed, but then we were having MPI-lockup issues. We have now
>> figured out the MPI issues are completely unrelated - some combination
>> with a newer mpi build and firedrake on our cluster which also occur
>> using main branches of everything. So switching back to an older MPI
>> build we are hoping to now test your most recent version of
>> adams/gamg-add-old-coarsening with these options and see whether the
>> memory errors are still there. Will let you know
>>
>> Best wishes
>> Stephan Kramer
>>
>> Coarsening details with various options for Level 6 of the test case:
>>
>> In our original setup (using "old" petsc), we had:
>>
>>             rows=516, cols=516, bs=6
>>             rows=12660, cols=12660, bs=6
>>             rows=346974, cols=346974, bs=6
>>             rows=19169670, cols=19169670, bs=3
>>
>> Then with the newer main petsc we had
>>
>>             rows=666, cols=666, bs=6
>>             rows=7740, cols=7740, bs=6
>>             rows=34902, cols=34902, bs=6
>>             rows=736578, cols=736578, bs=6
>>             rows=19169670, cols=19169670, bs=3
>>
>> Then on your branch with minimum_degree_ordering False:
>>
>>             rows=504, cols=504, bs=6
>>             rows=2274, cols=2274, bs=6
>>             rows=11010, cols=11010, bs=6
>>             rows=35790, cols=35790, bs=6
>>             rows=430686, cols=430686, bs=6
>>             rows=19169670, cols=19169670, bs=3
>>
>> And with minimum_degree_ordering False and use_aggressive_square_graph
>> True:
>>
>>             rows=498, cols=498, bs=6
>>             rows=12672, cols=12672, bs=6
>>             rows=346974, cols=346974, bs=6
>>             rows=19169670, cols=19169670, bs=3
>>
>> So that is indeed pretty much back to what it was before
>>
>>
>>
>>
>>
>>
>>
>>
>> On 31/08/2023 23:40, Mark Adams wrote:
>>> Hi Stephan,
>>>
>>> This branch is settling down.  adams/gamg-add-old-coarsening
>>> <https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening>
>>> I made the old, not minimum degree, ordering the default but kept the new
>>> "aggressive" coarsening as the default, so I am hoping that just adding
>>> "-pc_gamg_use_aggressive_square_graph true" to your regression tests will
>>> get you back to where you were before.
>>> Fingers crossed ... let me know if you have any success or not.
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>>> On Tue, Aug 15, 2023 at 1:45?PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> Hi Stephan,
>>>>
>>>> I have a branch that you can try: adams/gamg-add-old-coarsening
>>>> <https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening
>>>> Things to test:
>>>> * First, verify that nothing unintended changed by reproducing your bad
>>>> results with this branch (the defaults are the same)
>>>> * Try not using the minimum degree ordering that I suggested
>>>> with: -pc_gamg_use_minimum_degree_ordering false
>>>>     -- I am eager to see if that is the main problem.
>>>> * Go back to what I think is the old method:
>>>> -pc_gamg_use_minimum_degree_ordering
>>>> false -pc_gamg_use_aggressive_square_graph true
>>>>
>>>> When we get back to where you were, I would like to try to get modern
>>>> stuff working.
>>>> I did add a -pc_gamg_aggressive_mis_k <2>
>>>> You could to another step of MIS coarsening with
>> -pc_gamg_aggressive_mis_k
>>>> 3
>>>>
>>>> Anyway, lots to look at but, alas, AMG does have a lot of parameters.
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>> On Mon, Aug 14, 2023 at 4:26?PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>>> On Mon, Aug 14, 2023 at 11:03?AM Stephan Kramer <
>> s.kramer at imperial.ac.uk>
>>>>> wrote:
>>>>>
>>>>>> Many thanks for looking into this, Mark
>>>>>>> My 3D tests were not that different and I see you lowered the
>>>>>> threshold.
>>>>>>> Note, you can set the threshold to zero, but your test is running so
>>>>>> much
>>>>>>> differently than mine there is something else going on.
>>>>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot
>>>>>> for
>>>>>>> in 3D.
>>>>>>>
>>>>>>> So it is not clear what the problem is.  Some questions:
>>>>>>>
>>>>>>> * do you have a picture of this mesh to show me?
>>>>>> It's just a standard hexahedral cubed sphere mesh with the refinement
>>>>>> level giving the number of times each of the six sides have been
>>>>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16
>>>>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x
>> 16 =
>>>>>> 98304  hexes. And everything doubles in all 3 dimensions (so 2^3)
>> going
>>>>>> to the next Level
>>>>>>
>>>>> I see, and I assume these are pretty stretched elements.
>>>>>
>>>>>
>>>>>>> * what do you mean by Q1-Q2 elements?
>>>>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity
>>>>>> and (tri)linear for pressure
>>>>>>
>>>>>> I guess you could argue we could/should just do good old geometric
>>>>>> multigrid instead. More generally we do use this solver configuration
>> a
>>>>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our
>>>>>> adaptive mesh runs - would it be worth to see if we have the same
>>>>>> performance issues with tetrahedral P2-P1?
>>>>>>
>>>>> No, you have a clear reproducer, if not minimal.
>>>>> The first coarsening is very different.
>>>>>
>>>>> I am working on this and I see that I added a heuristic for thin bodies
>>>>> where you order the vertices in greedy algorithms with minimum degree
>> first.
>>>>> This will tend to pick corners first, edges then faces, etc.
>>>>> That may be the problem. I would like to understand it better (see
>> below).
>>>>>
>>>>>
>>>>>>> It would be nice to see if the new and old codes are similar without
>>>>>>> aggressive coarsening.
>>>>>>> This was the intended change of the major change in this time frame
>> as
>>>>>> you
>>>>>>> noticed.
>>>>>>> If these jobs are easy to run, could you check that the old and new
>>>>>>> versions are similar with "-pc_gamg_square_graph  0 ",  ( and you
>> only
>>>>>> need
>>>>>>> one time step).
>>>>>>> All you need to do is check that the first coarse grid has about the
>>>>>> same
>>>>>>> number of equations (large).
>>>>>> Unfortunately we're seeing some memory errors when we use this option,
>>>>>> and I'm not entirely clear whether we're just running out of memory
>> and
>>>>>> need to put it on a special queue.
>>>>>>
>>>>>> The run with square_graph 0 using new PETSc managed to get through one
>>>>>> solve at level 5, and is giving the following mg levels:
>>>>>>
>>>>>>            rows=174, cols=174, bs=6
>>>>>>              total: nonzeros=30276, allocated nonzeros=30276
>>>>>> --
>>>>>>              rows=2106, cols=2106, bs=6
>>>>>>              total: nonzeros=4238532, allocated nonzeros=4238532
>>>>>> --
>>>>>>              rows=21828, cols=21828, bs=6
>>>>>>              total: nonzeros=62588232, allocated nonzeros=62588232
>>>>>> --
>>>>>>              rows=589824, cols=589824, bs=6
>>>>>>              total: nonzeros=1082528928, allocated nonzeros=1082528928
>>>>>> --
>>>>>>              rows=2433222, cols=2433222, bs=3
>>>>>>              total: nonzeros=456526098, allocated nonzeros=456526098
>>>>>>
>>>>>> comparing with square_graph 100 with new PETSc
>>>>>>
>>>>>>              rows=96, cols=96, bs=6
>>>>>>              total: nonzeros=9216, allocated nonzeros=9216
>>>>>> --
>>>>>>              rows=1440, cols=1440, bs=6
>>>>>>              total: nonzeros=647856, allocated nonzeros=647856
>>>>>> --
>>>>>>              rows=97242, cols=97242, bs=6
>>>>>>              total: nonzeros=65656836, allocated nonzeros=65656836
>>>>>> --
>>>>>>              rows=2433222, cols=2433222, bs=3
>>>>>>              total: nonzeros=456526098, allocated nonzeros=456526098
>>>>>>
>>>>>> and old PETSc with square_graph 100
>>>>>>
>>>>>>              rows=90, cols=90, bs=6
>>>>>>              total: nonzeros=8100, allocated nonzeros=8100
>>>>>> --
>>>>>>              rows=1872, cols=1872, bs=6
>>>>>>              total: nonzeros=1234080, allocated nonzeros=1234080
>>>>>> --
>>>>>>              rows=47652, cols=47652, bs=6
>>>>>>              total: nonzeros=23343264, allocated nonzeros=23343264
>>>>>> --
>>>>>>              rows=2433222, cols=2433222, bs=3
>>>>>>              total: nonzeros=456526098, allocated nonzeros=456526098
>>>>>> --
>>>>>>
>>>>>> Unfortunately old PETSc with square_graph 0 did not complete a single
>>>>>> solve before giving the memory error
>>>>>>
>>>>> OK, thanks for trying.
>>>>>
>>>>> I am working on this and I will give you a branch to test, but if you
>> can
>>>>> rebuild PETSc here is a quick test that might fix your problem.
>>>>> In src/ksp/pc/impls/gamg/agg.c you will see:
>>>>>
>>>>>       PetscCall(PetscSortIntWithArray(nloc, degree, permute));
>>>>>
>>>>> If you can comment this out in the new code and compare with the old,
>>>>> that might fix the problem.
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>
>>>>>>> BTW, I am starting to think I should add the old method back as an
>>>>>> option.
>>>>>>> I did not think this change would cause large differences.
>>>>>> Yes, I think that would be much appreciated. Let us know if we can do
>>>>>> any testing
>>>>>>
>>>>>> Best wishes
>>>>>> Stephan
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Note that we are providing the rigid body near nullspace,
>>>>>>>> hence the bs=3 to bs=6.
>>>>>>>> We have tried different values for the gamg_threshold but it doesn't
>>>>>>>> really seem to significantly alter the coarsening amount in that
>> first
>>>>>>>> step.
>>>>>>>>
>>>>>>>> Do you have any suggestions for further things we should try/look
>> at?
>>>>>>>> Any feedback would be much appreciated
>>>>>>>>
>>>>>>>> Best wishes
>>>>>>>> Stephan Kramer
>>>>>>>>
>>>>>>>> Full logs including log_view timings available from
>>>>>>>> https://github.com/stephankramer/petsc-scaling/
>>>>>>>>
>>>>>>>> In particular:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat
>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat
>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat
>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat
>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat
>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat
>>


From mfadams at lbl.gov  Wed Oct  4 09:11:46 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Wed, 4 Oct 2023 10:11:46 -0400
Subject: [petsc-users] performance regression with GAMG
In-Reply-To: <0b512a75-d6ae-8a3f-1478-970b700c008a@imperial.ac.uk>
References: <da49b029-a648-7ae8-af8e-a11761b4cf73@imperial.ac.uk>
	<CADOhEh4FFGT0cXxsQpvWttkOt4Vfrtm=yBeqpAtAcCWa17V7qA@mail.gmail.com>
	<9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk>
	<CADOhEh72DsiA5CHKQ15ec6EKfW9a-ST7U_xEoZ2_LO2taNmEqA@mail.gmail.com>
	<CADOhEh4r7z2-hV4F8oSQ-JvsAp9+U_j6UkyakAjuCwLDdjCb_w@mail.gmail.com>
	<CADOhEh6SSjUQR2G_JivHe576KyNpea-=w9UPKpj5QVWhdPbaMg@mail.gmail.com>
	<99896e04-7ac2-9e92-0922-e78f2d0c710d@imperial.ac.uk>
	<CADOhEh7my6wfisDOiwOP3zniLZfkzJVvQG=dgfrSrBckbxw3Pw@mail.gmail.com>
	<0b512a75-d6ae-8a3f-1478-970b700c008a@imperial.ac.uk>
Message-ID: <CADOhEh4SJAd=G5Dk73hXCaG9T3fMYWrPtVxY99WSGaRnyixt6g@mail.gmail.com>

Thanks Stephan,

It looks like the matrix is in a bad/incorrect state and parallel Mat-Mat
is waiting for messages that were not sent. A bug.

Can you try my branch, which is ready to merge, adams/gamg-fast-filter.
We added a new filtering method in main that uses low memory but I found it
was slow, so this branch brings back the old filter code, used by default,
and keeps the low memory version as an option.
It is possible this low memory filtering messed up the internals of the Mat
in some way.
I hope this is it, but if not we can continue.

This MR also makes square graph the default.
I have found it does create better aggregates and on GPUs, with Kokkos bug
fixes from Junchao, Mat-Mat is fast. (it might be slow on CPUs)

Mark


On Wed, Oct 4, 2023 at 12:30?AM Stephan Kramer <s.kramer at imperial.ac.uk>
wrote:

> Hi Mark
>
> Thanks again for re-enabling the square graph aggressive coarsening
> option which seems to have restored performance for most of our cases.
> Unfortunately we do have a remaining issue, which only seems to occur
> for the larger mesh size ("level 7" which has 6,389,890 vertices and we
> normally run on 1536 cpus): we either get a "Petsc has generated
> inconsistent data" error, or a hang - both when constructing the square
> graph matrix. So this is with the new
> -pc_gamg_aggressive_square_graph=true option, without the option there's
> no error but of course we would get back to the worse performance.
>
> Backtrace for the "inconsistent data" error. Note this is actually just
> petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge
> of adams/gamg-add-old-coarsening into main - with one unrelated commit
> from firedrake
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Petsc has generated inconsistent data
> [0]PETSC ERROR: j 8 not equal to expected number of sends 9
> [0]PETSC ERROR: Petsc Development GIT revision:
> v3.4.2-43104-ga3b76b71a1  GIT Date: 2023-09-18 10:26:04 +0100
> [0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a  named
> gadi-cpu-clx-0241.gadi.nci.org.au by sck551 Wed Oct  4 14:30:41 2023
> [0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix
> --with-make-np=4 --with-debugging=0 --with-shared-libraries=1
> --with-fortran-bindings=0 --with-zlib --with-c2html=0
> --with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx
> --with-fc=mpifort --download-hdf5 --download-hypre
> --download-superlu_dist --download-ptscotch --download-suitesparse
> --download-pastix --download-hwloc --download-metis --download-scalapack
> --download-mumps --download-chaco --download-ml
> CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441
> [0]PETSC ERROR: #1 PetscGatherMessageLengths2() at
> /jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270
> [0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867
> [0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
> [0]PETSC ERROR: #4 MatProductSymbolic() at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
> [0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
> [0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
> [0]PETSC ERROR: #7 PCSetUp_GAMG() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
> [0]PETSC ERROR: #8 PCSetUp() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
> [0]PETSC ERROR: #9 PCApply() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
> [0]PETSC ERROR: #10 PCApply() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
> [0]PETSC ERROR: #11 KSP_PCApply() at
> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
> [0]PETSC ERROR: #12 KSPSolve_CG() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162
> [0]PETSC ERROR: #13 KSPSolve_Private() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
> [0]PETSC ERROR: #14 KSPSolve() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
> [0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at
>
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175
> [0]PETSC ERROR: #16 PCApply() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
> [0]PETSC ERROR: #17 KSP_PCApply() at
> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
> [0]PETSC ERROR: #18 KSPSolve_PREONLY() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25
> [0]PETSC ERROR: #19 KSPSolve_Private() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
> [0]PETSC ERROR: #20 KSPSolve() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
> [0]PETSC ERROR: #21 SNESSolve_KSPONLY() at
> /jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49
> [0]PETSC ERROR: #22 SNESSolve() at
> /jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635
>
> Last -info :pc messages:
>
> [0] <pc:gamg> PCSetUp(): Setting up PC for first time
> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0)
> N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536
> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in
> graph (1.588710e+07 1.765233e+06)
> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:
> Square Graph on level 1
> [0] <pc:gamg> fixAggregatesWithSquare(): isMPI = yes
> [0] <pc:gamg> PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_:
> New grid 380144 nodes
> [0] <pc:gamg> PCGAMGOptProlongator_AGG():
> Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00
> min=9.015236e-02 PC=jacobi
> [0] <pc:gamg> PCGAMGOptProlongator_AGG():
> Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra
> 0.0901524 4.48938
> [0] <pc:gamg> PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_:
> Coarse grid reduction from 1536 to 1536 active processes
> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1)
> N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes
> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges in
> graph (5.310360e+05 5.353000e+03)
> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:
> Square Graph on level 2
>
> The hang (on a slightly different model configuration but on the same
> mesh and n/o cores) seems to occur in the same location. If I use gdb to
> attach to the running processes, it seems on some cores it has somehow
> manages to fall out of the pcsetup and is waiting in the first norm
> calculation in the outside CG iteration:
>
> #0  0x000014cce9999119 in
> hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from
> /apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so
> #1  0x000014ccef2c2737 in _coll_ml_allreduce () from
> /apps/hcoll/4.7.3202/lib/libhcoll.so.1
> #2  0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1,
> rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 <ompi_mpi_double>,
> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0, module=0x43a0110)
> at
>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228
> #3  0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1,
> recvbuf=<optimized out>, count=1, datatype=<optimized out>,
> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0) at pallreduce.c:113
> #4  0x000014cd271c9889 in VecNorm_MPI_Default (xin=<optimized out>,
> type=<optimized out>, z=<optimized out>, VecNorm_SeqFn=<optimized out>)
> at
>
> /jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168
> #5  VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at
> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39
> #6  0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648,
> val=0x22d) at
> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214
> #7  0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163
> etc.
>
> but with other cores still stuck at:
>
> #0  0x000015375cf41e8a in ucp_worker_progress () from
> /apps/ucx/1.12.0/lib/libucp.so.0
> #1  0x000015377d4bd57b in opal_progress () at
>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231
> #2  0x000015377d4c3ba5 in ompi_sync_wait_mt
> (sync=sync at entry=0x7ffd6aedf6f0) at
>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85
> #3  0x000015378bf7cf38 in ompi_request_default_wait_any (count=8,
> requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at
>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124
> #4  0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0,
> indx=0x7ffd6aedfa60, status=<optimized out>) at pwaitany.c:86
> #5  0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ
> (P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884
> #6  0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ
> (C=0x2cc7500) at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
> #7  0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
> #8  0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500,
> Gmat1=0x1, Gmat2=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
> #9  0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500,
> a_Gmat1=0x1, agg_lists=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
> #10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
> #11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
> #12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
> #13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply
> (__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524,
> __pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082
> #14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS
> (func=0x15378f302890, args=0x83b3218, nargsf=<optimized out>,
> kwnames=<optimized out>) at ../Objects/descrobject.c:405
> #15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0,
> nargsf=<optimized out>, args=0x83b3218, callable=0x15378f302890,
> tstate=0x23e0020) at ../Include/cpython/abstract.h:114
> #16 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>,
> args=0x83b3218, callable=0x15378f302890) at
> ../Include/cpython/abstract.h:123
> #17 call_function (kwnames=0x0, oparg=<optimized out>,
> pp_stack=<synthetic pointer>, trace_info=0x7ffd6aee0390,
> tstate=<optimized out>) at ../Python/ceval.c:5867
> #18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>,
> throwflag=<optimized out>) at ../Python/ceval.c:4198
> #19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080,
> tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46
> #20 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>,
> locals=<optimized out>, args=<optimized out>, argcount=4,
> kwnames=<optimized out>) at ../Python/ceval.c:5065
> #21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func=<optimized
> out>, args=0x1, _nargs=<optimized out>, kwargs=<optimized out>) at
> src/petsc4py/PETSc.c:548022
> #22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500,
> __pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979
> #23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
> #24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1,
> y=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
> #25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162
>
> Let me know if there is anything further we can try to debug this issue
>
> Kind regards
> Stephan Kramer
>
>
> On 02/09/2023 01:58, Mark Adams wrote:
> > Fantastic!
> >
> > I fixed a memory free problem. You should be OK now.
> > I am pretty sure you are good but I would like to wait to get any
> feedback
> > from you.
> > We should have a release at the end of the month and it would be nice to
> > get this into it.
> >
> > Thanks,
> > Mark
> >
> >
> > On Fri, Sep 1, 2023 at 7:07?AM Stephan Kramer <s.kramer at imperial.ac.uk>
> > wrote:
> >
> >> Hi Mark
> >>
> >> Sorry took a while to report back. We have tried your branch but hit a
> >> few issues, some of which we're not entirely sure are related.
> >>
> >> First switching off minimum degree ordering, and then switching to the
> >> old version of aggressive coarsening, as you suggested, got us back to
> >> the coarsening behaviour that we had previously, but then we also
> >> observed an even further worsening of the iteration count: it had
> >> previously gone up by 50% already (with the newer main petsc), but now
> >> was more than double "old" petsc. Took us a while to realize this was
> >> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi.
> >> Switching this also back to the old default we get back to very similar
> >> coarsening levels (see below for more details if it is of interest) and
> >> iteration counts.
> >>
> >> So that's all very good news. However, we were also starting seeing
> >> memory errors (double free or corruption) when we switched off the
> >> minimum degree ordering. Because this was at an earlier version of your
> >> branch we then rebuild, hoping this was just an earlier bug that had
> >> been fixed, but then we were having MPI-lockup issues. We have now
> >> figured out the MPI issues are completely unrelated - some combination
> >> with a newer mpi build and firedrake on our cluster which also occur
> >> using main branches of everything. So switching back to an older MPI
> >> build we are hoping to now test your most recent version of
> >> adams/gamg-add-old-coarsening with these options and see whether the
> >> memory errors are still there. Will let you know
> >>
> >> Best wishes
> >> Stephan Kramer
> >>
> >> Coarsening details with various options for Level 6 of the test case:
> >>
> >> In our original setup (using "old" petsc), we had:
> >>
> >>             rows=516, cols=516, bs=6
> >>             rows=12660, cols=12660, bs=6
> >>             rows=346974, cols=346974, bs=6
> >>             rows=19169670, cols=19169670, bs=3
> >>
> >> Then with the newer main petsc we had
> >>
> >>             rows=666, cols=666, bs=6
> >>             rows=7740, cols=7740, bs=6
> >>             rows=34902, cols=34902, bs=6
> >>             rows=736578, cols=736578, bs=6
> >>             rows=19169670, cols=19169670, bs=3
> >>
> >> Then on your branch with minimum_degree_ordering False:
> >>
> >>             rows=504, cols=504, bs=6
> >>             rows=2274, cols=2274, bs=6
> >>             rows=11010, cols=11010, bs=6
> >>             rows=35790, cols=35790, bs=6
> >>             rows=430686, cols=430686, bs=6
> >>             rows=19169670, cols=19169670, bs=3
> >>
> >> And with minimum_degree_ordering False and use_aggressive_square_graph
> >> True:
> >>
> >>             rows=498, cols=498, bs=6
> >>             rows=12672, cols=12672, bs=6
> >>             rows=346974, cols=346974, bs=6
> >>             rows=19169670, cols=19169670, bs=3
> >>
> >> So that is indeed pretty much back to what it was before
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 31/08/2023 23:40, Mark Adams wrote:
> >>> Hi Stephan,
> >>>
> >>> This branch is settling down.  adams/gamg-add-old-coarsening
> >>> <
> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening>
> >>> I made the old, not minimum degree, ordering the default but kept the
> new
> >>> "aggressive" coarsening as the default, so I am hoping that just adding
> >>> "-pc_gamg_use_aggressive_square_graph true" to your regression tests
> will
> >>> get you back to where you were before.
> >>> Fingers crossed ... let me know if you have any success or not.
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>>
> >>> On Tue, Aug 15, 2023 at 1:45?PM Mark Adams <mfadams at lbl.gov> wrote:
> >>>
> >>>> Hi Stephan,
> >>>>
> >>>> I have a branch that you can try: adams/gamg-add-old-coarsening
> >>>> <
> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening
> >>>> Things to test:
> >>>> * First, verify that nothing unintended changed by reproducing your
> bad
> >>>> results with this branch (the defaults are the same)
> >>>> * Try not using the minimum degree ordering that I suggested
> >>>> with: -pc_gamg_use_minimum_degree_ordering false
> >>>>     -- I am eager to see if that is the main problem.
> >>>> * Go back to what I think is the old method:
> >>>> -pc_gamg_use_minimum_degree_ordering
> >>>> false -pc_gamg_use_aggressive_square_graph true
> >>>>
> >>>> When we get back to where you were, I would like to try to get modern
> >>>> stuff working.
> >>>> I did add a -pc_gamg_aggressive_mis_k <2>
> >>>> You could to another step of MIS coarsening with
> >> -pc_gamg_aggressive_mis_k
> >>>> 3
> >>>>
> >>>> Anyway, lots to look at but, alas, AMG does have a lot of parameters.
> >>>>
> >>>> Thanks,
> >>>> Mark
> >>>>
> >>>> On Mon, Aug 14, 2023 at 4:26?PM Mark Adams <mfadams at lbl.gov> wrote:
> >>>>
> >>>>> On Mon, Aug 14, 2023 at 11:03?AM Stephan Kramer <
> >> s.kramer at imperial.ac.uk>
> >>>>> wrote:
> >>>>>
> >>>>>> Many thanks for looking into this, Mark
> >>>>>>> My 3D tests were not that different and I see you lowered the
> >>>>>> threshold.
> >>>>>>> Note, you can set the threshold to zero, but your test is running
> so
> >>>>>> much
> >>>>>>> differently than mine there is something else going on.
> >>>>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to
> shoot
> >>>>>> for
> >>>>>>> in 3D.
> >>>>>>>
> >>>>>>> So it is not clear what the problem is.  Some questions:
> >>>>>>>
> >>>>>>> * do you have a picture of this mesh to show me?
> >>>>>> It's just a standard hexahedral cubed sphere mesh with the
> refinement
> >>>>>> level giving the number of times each of the six sides have been
> >>>>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to
> 16
> >>>>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x
> >> 16 =
> >>>>>> 98304  hexes. And everything doubles in all 3 dimensions (so 2^3)
> >> going
> >>>>>> to the next Level
> >>>>>>
> >>>>> I see, and I assume these are pretty stretched elements.
> >>>>>
> >>>>>
> >>>>>>> * what do you mean by Q1-Q2 elements?
> >>>>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for
> velocity
> >>>>>> and (tri)linear for pressure
> >>>>>>
> >>>>>> I guess you could argue we could/should just do good old geometric
> >>>>>> multigrid instead. More generally we do use this solver
> configuration
> >> a
> >>>>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our
> >>>>>> adaptive mesh runs - would it be worth to see if we have the same
> >>>>>> performance issues with tetrahedral P2-P1?
> >>>>>>
> >>>>> No, you have a clear reproducer, if not minimal.
> >>>>> The first coarsening is very different.
> >>>>>
> >>>>> I am working on this and I see that I added a heuristic for thin
> bodies
> >>>>> where you order the vertices in greedy algorithms with minimum degree
> >> first.
> >>>>> This will tend to pick corners first, edges then faces, etc.
> >>>>> That may be the problem. I would like to understand it better (see
> >> below).
> >>>>>
> >>>>>
> >>>>>>> It would be nice to see if the new and old codes are similar
> without
> >>>>>>> aggressive coarsening.
> >>>>>>> This was the intended change of the major change in this time frame
> >> as
> >>>>>> you
> >>>>>>> noticed.
> >>>>>>> If these jobs are easy to run, could you check that the old and new
> >>>>>>> versions are similar with "-pc_gamg_square_graph  0 ",  ( and you
> >> only
> >>>>>> need
> >>>>>>> one time step).
> >>>>>>> All you need to do is check that the first coarse grid has about
> the
> >>>>>> same
> >>>>>>> number of equations (large).
> >>>>>> Unfortunately we're seeing some memory errors when we use this
> option,
> >>>>>> and I'm not entirely clear whether we're just running out of memory
> >> and
> >>>>>> need to put it on a special queue.
> >>>>>>
> >>>>>> The run with square_graph 0 using new PETSc managed to get through
> one
> >>>>>> solve at level 5, and is giving the following mg levels:
> >>>>>>
> >>>>>>            rows=174, cols=174, bs=6
> >>>>>>              total: nonzeros=30276, allocated nonzeros=30276
> >>>>>> --
> >>>>>>              rows=2106, cols=2106, bs=6
> >>>>>>              total: nonzeros=4238532, allocated nonzeros=4238532
> >>>>>> --
> >>>>>>              rows=21828, cols=21828, bs=6
> >>>>>>              total: nonzeros=62588232, allocated nonzeros=62588232
> >>>>>> --
> >>>>>>              rows=589824, cols=589824, bs=6
> >>>>>>              total: nonzeros=1082528928, allocated
> nonzeros=1082528928
> >>>>>> --
> >>>>>>              rows=2433222, cols=2433222, bs=3
> >>>>>>              total: nonzeros=456526098, allocated nonzeros=456526098
> >>>>>>
> >>>>>> comparing with square_graph 100 with new PETSc
> >>>>>>
> >>>>>>              rows=96, cols=96, bs=6
> >>>>>>              total: nonzeros=9216, allocated nonzeros=9216
> >>>>>> --
> >>>>>>              rows=1440, cols=1440, bs=6
> >>>>>>              total: nonzeros=647856, allocated nonzeros=647856
> >>>>>> --
> >>>>>>              rows=97242, cols=97242, bs=6
> >>>>>>              total: nonzeros=65656836, allocated nonzeros=65656836
> >>>>>> --
> >>>>>>              rows=2433222, cols=2433222, bs=3
> >>>>>>              total: nonzeros=456526098, allocated nonzeros=456526098
> >>>>>>
> >>>>>> and old PETSc with square_graph 100
> >>>>>>
> >>>>>>              rows=90, cols=90, bs=6
> >>>>>>              total: nonzeros=8100, allocated nonzeros=8100
> >>>>>> --
> >>>>>>              rows=1872, cols=1872, bs=6
> >>>>>>              total: nonzeros=1234080, allocated nonzeros=1234080
> >>>>>> --
> >>>>>>              rows=47652, cols=47652, bs=6
> >>>>>>              total: nonzeros=23343264, allocated nonzeros=23343264
> >>>>>> --
> >>>>>>              rows=2433222, cols=2433222, bs=3
> >>>>>>              total: nonzeros=456526098, allocated nonzeros=456526098
> >>>>>> --
> >>>>>>
> >>>>>> Unfortunately old PETSc with square_graph 0 did not complete a
> single
> >>>>>> solve before giving the memory error
> >>>>>>
> >>>>> OK, thanks for trying.
> >>>>>
> >>>>> I am working on this and I will give you a branch to test, but if you
> >> can
> >>>>> rebuild PETSc here is a quick test that might fix your problem.
> >>>>> In src/ksp/pc/impls/gamg/agg.c you will see:
> >>>>>
> >>>>>       PetscCall(PetscSortIntWithArray(nloc, degree, permute));
> >>>>>
> >>>>> If you can comment this out in the new code and compare with the old,
> >>>>> that might fix the problem.
> >>>>>
> >>>>> Thanks,
> >>>>> Mark
> >>>>>
> >>>>>
> >>>>>>> BTW, I am starting to think I should add the old method back as an
> >>>>>> option.
> >>>>>>> I did not think this change would cause large differences.
> >>>>>> Yes, I think that would be much appreciated. Let us know if we can
> do
> >>>>>> any testing
> >>>>>>
> >>>>>> Best wishes
> >>>>>> Stephan
> >>>>>>
> >>>>>>
> >>>>>>> Thanks,
> >>>>>>> Mark
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Note that we are providing the rigid body near nullspace,
> >>>>>>>> hence the bs=3 to bs=6.
> >>>>>>>> We have tried different values for the gamg_threshold but it
> doesn't
> >>>>>>>> really seem to significantly alter the coarsening amount in that
> >> first
> >>>>>>>> step.
> >>>>>>>>
> >>>>>>>> Do you have any suggestions for further things we should try/look
> >> at?
> >>>>>>>> Any feedback would be much appreciated
> >>>>>>>>
> >>>>>>>> Best wishes
> >>>>>>>> Stephan Kramer
> >>>>>>>>
> >>>>>>>> Full logs including log_view timings available from
> >>>>>>>> https://github.com/stephankramer/petsc-scaling/
> >>>>>>>>
> >>>>>>>> In particular:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231004/a41da520/attachment-0001.html>

From knepley at gmail.com  Wed Oct  4 11:12:56 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 4 Oct 2023 12:12:56 -0400
Subject: [petsc-users] Compute integral using DMPlexComputeIntegralFEM
In-Reply-To: <E06B1C46-D31E-4C74-9E13-946C2406E6CD@gmail.com>
References: <E06B1C46-D31E-4C74-9E13-946C2406E6CD@gmail.com>
Message-ID: <CAMYG4G=g-6Sxd7H_qQOky6NC4obtoXufK+q7NVZOBpa_UYTAwA@mail.gmail.com>

On Fri, Sep 8, 2023 at 6:26?PM David Andrs <andrsd at gmail.com> wrote:

> Hi all!
>
> I am trying to use DMPlexComputeIntegralFEM to compute an integral
> $\int_\Omega u d\Omega$. My domain is a square (-1, 1)^2 (2x2 QUAD4
> elements), I add first order Lagrange FE field on it, set the solution
> vector (computed by a previous simulation).
>
> The value I am seeing computed by PETSc is -4, but the hand-calculated
> value of this integral is -4.6. I also checked this in paraview using the
> ?Integrate Variables? filter and it also returns -4.6 (this was to double
> check that my hand-calculated value is correct).
>

Sorry it took so long. You caught me at a bad time.

Something must be wrong with your analytic integrals. Here is me doing them
by hand. You have a 3x3 vertex arrangement with coefficients

   1  0 -3
  -2 -1 -2
  -3  0  1

>From the symmetry, the integrals of the cells along each diagonal must be
equal. Now, the shape functions for Q_1 are

     (1 - x) y            x y

  (1 - x)(1 - y)     x (1 - y)

Thus the integral for the lower left cell is

  \int^1_0 dx \int^1_0 dy -3 + 3 x + y - 2 xy = -3 + 3/2 + 1/2 - 2/4 = -1.5

which is also the upper right cell. The integral for the lower right cell is

  \int^1_0 dx \int^1_0 dy x - y - 2 xy = 1/2 - 1/2 - 2/4 = -1/2

which is also the upper left cell. Thus we get -1.5 - 1.5 - 0.5 - 0.5 = -4,
which is what Plex gets.

  THanks,

    Matt

So, I must be missing something obvious in my code. Attached is the minimal
> PETSc code to show what I am doing. This is against PETSc 3.19.4.
>
> Thanks in advance for your help,
>
> David
>
> --
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231004/70020d36/attachment.html>

From srvenkat at utexas.edu  Wed Oct  4 18:02:37 2023
From: srvenkat at utexas.edu (Sreeram R Venkat)
Date: Wed, 4 Oct 2023 18:02:37 -0500
Subject: [petsc-users] Scattering a vector to/from a subset of processors
Message-ID: <CADtq7MsGCtU-DQcsRABth3ixZ21D9nYLtuUqRc3++yy3mZ-PFw@mail.gmail.com>

Suppose I am running on 12 processors, and I have a vector "v" of size 36
partitioned over the first 4. v still uses the PETSC_COMM_WORLD, so it has
a layout of (9, 9, 9, 9, 0, 0, ..., 0). Now, I would like to repartition it
over all 12 processors, so that the layout becomes (3, 3, 3, ..., 3). I've
been trying to use VecScatter to do this, but I'm not sure what IndexSets
to use for the sender and receiver.

The result I am trying to achieve is this:

Assume the vector is v = <0, 1, 2, ..., 35>

     Start                                Finish
Proc | Entries                 Proc | Entries
    0   |  0,...,8                     0   | 0, 1, 2
    1   |  9,...,17                   1   | 3, 4, 5
    2   |  18,...,26                 2   | 6, 7, 8
    3   |  27,...,35                 3   | 9, 10, 11
    4   |  None                     4   | 12, 13, 14
    5   |  None                     5   | 15, 16, 17
    6   |  None                     6   | 18, 19, 20
    7   |  None                     7   | 21, 22, 23
    8   |  None                     8   | 24, 25, 26
    9   |  None                     9   | 27, 28, 29
    10   |  None                   10 | 30, 31, 32
    11   |  None                   11  | 33, 34, 35

Appreciate any help you can provide on this.

Thanks,
Sreeram
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231004/1d2948d3/attachment.html>

From junchao.zhang at gmail.com  Wed Oct  4 18:40:50 2023
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Wed, 4 Oct 2023 18:40:50 -0500
Subject: [petsc-users] Scattering a vector to/from a subset of processors
In-Reply-To: <CADtq7MsGCtU-DQcsRABth3ixZ21D9nYLtuUqRc3++yy3mZ-PFw@mail.gmail.com>
References: <CADtq7MsGCtU-DQcsRABth3ixZ21D9nYLtuUqRc3++yy3mZ-PFw@mail.gmail.com>
Message-ID: <CA+MQGp_iWB5ztpAHD8k7Rs0fKqBWUusRO79chyCqO_no=53-Bg@mail.gmail.com>

Hi, Sreeram,
You can try this code. Since x, y are both MPI vectors, we just need to say
we want to scatter x[0:N] to y[0:N]. The 12 index sets with your example on
the 12 processes would be [0..8], [9..17], [18..26], [27..35], [], ...,
[].  Actually, you can do it arbitrarily, say, with 12 index sets [0..17],
[18..35], .., [].  PETSc will figure out how to do the communication.

PetscInt rstart, rend, N;
IS ix;
VecScatter vscat;
Vec y;
MPI_Comm comm;
VecType type;

PetscObjectGetComm((PetscObject)x, &comm);
VecGetType(x, &type);
VecGetSize(x, &N);
VecGetOwnershipRange(x, &rstart, &rend);

VecCreate(comm, &y);
VecSetSizes(y, PETSC_DECIDE, N);
VecSetType(y, type);

ISCreateStride(PetscObjectComm((PetscObject)x), rend - rstart, rstart, 1,
&ix);
VecScatterCreate(x, ix, y, ix, &vscat);

--Junchao Zhang


On Wed, Oct 4, 2023 at 6:03?PM Sreeram R Venkat <srvenkat at utexas.edu> wrote:

> Suppose I am running on 12 processors, and I have a vector "v" of size 36
> partitioned over the first 4. v still uses the PETSC_COMM_WORLD, so it has
> a layout of (9, 9, 9, 9, 0, 0, ..., 0). Now, I would like to repartition it
> over all 12 processors, so that the layout becomes (3, 3, 3, ..., 3). I've
> been trying to use VecScatter to do this, but I'm not sure what IndexSets
> to use for the sender and receiver.
>
> The result I am trying to achieve is this:
>
> Assume the vector is v = <0, 1, 2, ..., 35>
>
>      Start                                Finish
> Proc | Entries                 Proc | Entries
>     0   |  0,...,8                     0   | 0, 1, 2
>     1   |  9,...,17                   1   | 3, 4, 5
>     2   |  18,...,26                 2   | 6, 7, 8
>     3   |  27,...,35                 3   | 9, 10, 11
>     4   |  None                     4   | 12, 13, 14
>     5   |  None                     5   | 15, 16, 17
>     6   |  None                     6   | 18, 19, 20
>     7   |  None                     7   | 21, 22, 23
>     8   |  None                     8   | 24, 25, 26
>     9   |  None                     9   | 27, 28, 29
>     10   |  None                   10 | 30, 31, 32
>     11   |  None                   11  | 33, 34, 35
>
> Appreciate any help you can provide on this.
>
> Thanks,
> Sreeram
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231004/c09d89c2/attachment.html>

From thanasis.boutsikakis at corintis.com  Thu Oct  5 06:09:11 2023
From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis)
Date: Thu, 5 Oct 2023 13:09:11 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
Message-ID: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>

Hi everyone,

I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is

    Phi.transposeMatMult(A, A1)
  File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
petsc4py.PETSc.Error: error code 56
[0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
[0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
[0] No support for this operation for this object type
[0] Call MatProductCreate() first

Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)

"""Experimenting with PETSc mat-mat multiplication"""

import time

import numpy as np
from colorama import Fore
from firedrake import COMM_SELF, COMM_WORLD
from firedrake.petsc import PETSc
from mpi4py import MPI
from numpy.testing import assert_array_almost_equal

from utilities import (
    Print,
    create_petsc_matrix,
)

nproc = COMM_WORLD.size
rank = COMM_WORLD.rank

# --------------------------------------------
# EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
#  A' = Phi.T * A * Phi
# [k x k] <- [k x m] x [m x m] x [m x k]
# --------------------------------------------

m, k = 11, 7
# Generate the random numpy matrices
np.random.seed(0)  # sets the seed to 0
A_np = np.random.randint(low=0, high=6, size=(m, m))
Phi_np = np.random.randint(low=0, high=6, size=(m, k))

# Create A as an mpi matrix distributed on each process
A = create_petsc_matrix(A_np)

# Create Phi as an mpi matrix distributed on each process
Phi = create_petsc_matrix(Phi_np)

A1 = create_petsc_matrix(np.zeros((k, m)))

# Now A1 contains the result of Phi^T * A
Phi.transposeMatMult(A, A1)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/feb6dc6e/attachment-0001.html>

From thanasis.boutsikakis at corintis.com  Thu Oct  5 06:18:07 2023
From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis)
Date: Thu, 5 Oct 2023 13:18:07 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
Message-ID: <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>

Sorry, forgot function create_petsc_matrix()

def create_petsc_matrix(input_array sparse=True):
    """Create a PETSc matrix from an input_array

    Args:
        input_array (np array): Input array
        partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
        sparse (bool, optional): Toggle for sparese or dense. Defaults to True.

    Returns:
        PETSc mat: PETSc matrix
    """
    # Check if input_array is 1D and reshape if necessary
    assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
    global_rows, global_cols = input_array.shape

    size = ((None, global_rows), (global_cols, global_cols))

    # Create a sparse or dense matrix based on the 'sparse' argument
    if sparse:
        matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
    else:
        matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
    matrix.setUp()

    local_rows_start, local_rows_end = matrix.getOwnershipRange()

    for counter, i in enumerate(range(local_rows_start, local_rows_end)):
        # Calculate the correct row in the array for the current process
        row_in_array = counter + local_rows_start
        matrix.setValues(
            i, range(global_cols), input_array[row_in_array, :], addv=False
        )

    # Assembly the matrix to compute the final structure
    matrix.assemblyBegin()
    matrix.assemblyEnd()

    return matrix

> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
> 
> Hi everyone,
> 
> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
> 
>     Phi.transposeMatMult(A, A1)
>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
> petsc4py.PETSc.Error: error code 56
> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
> [0] No support for this operation for this object type
> [0] Call MatProductCreate() first
> 
> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
> 
> """Experimenting with PETSc mat-mat multiplication"""
> 
> import time
> 
> import numpy as np
> from colorama import Fore
> from firedrake import COMM_SELF, COMM_WORLD
> from firedrake.petsc import PETSc
> from mpi4py import MPI
> from numpy.testing import assert_array_almost_equal
> 
> from utilities import (
>     Print,
>     create_petsc_matrix,
> )
> 
> nproc = COMM_WORLD.size
> rank = COMM_WORLD.rank
> 
> # --------------------------------------------
> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
> #  A' = Phi.T * A * Phi
> # [k x k] <- [k x m] x [m x m] x [m x k]
> # --------------------------------------------
> 
> m, k = 11, 7
> # Generate the random numpy matrices
> np.random.seed(0)  # sets the seed to 0
> A_np = np.random.randint(low=0, high=6, size=(m, m))
> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
> 
> # Create A as an mpi matrix distributed on each process
> A = create_petsc_matrix(A_np)
> 
> # Create Phi as an mpi matrix distributed on each process
> Phi = create_petsc_matrix(Phi_np)
> 
> A1 = create_petsc_matrix(np.zeros((k, m)))
> 
> # Now A1 contains the result of Phi^T * A
> Phi.transposeMatMult(A, A1)
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/8590aee5/attachment.html>

From pierre at joliv.et  Thu Oct  5 06:22:11 2023
From: pierre at joliv.et (Pierre Jolivet)
Date: Thu, 5 Oct 2023 13:22:11 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
Message-ID: <E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>

How about using ptap which will use MatPtAP?
It will be more efficient (and it will help you bypass the issue).

Thanks,
Pierre

> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
> 
> Sorry, forgot function create_petsc_matrix()
> 
> def create_petsc_matrix(input_array sparse=True):
>     """Create a PETSc matrix from an input_array
> 
>     Args:
>         input_array (np array): Input array
>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
> 
>     Returns:
>         PETSc mat: PETSc matrix
>     """
>     # Check if input_array is 1D and reshape if necessary
>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>     global_rows, global_cols = input_array.shape
> 
>     size = ((None, global_rows), (global_cols, global_cols))
> 
>     # Create a sparse or dense matrix based on the 'sparse' argument
>     if sparse:
>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>     else:
>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>     matrix.setUp()
> 
>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
> 
>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>         # Calculate the correct row in the array for the current process
>         row_in_array = counter + local_rows_start
>         matrix.setValues(
>             i, range(global_cols), input_array[row_in_array, :], addv=False
>         )
> 
>     # Assembly the matrix to compute the final structure
>     matrix.assemblyBegin()
>     matrix.assemblyEnd()
> 
>     return matrix
> 
>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>> 
>> Hi everyone,
>> 
>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
>> 
>>     Phi.transposeMatMult(A, A1)
>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
>> petsc4py.PETSc.Error: error code 56
>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>> [0] No support for this operation for this object type
>> [0] Call MatProductCreate() first
>> 
>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
>> 
>> """Experimenting with PETSc mat-mat multiplication"""
>> 
>> import time
>> 
>> import numpy as np
>> from colorama import Fore
>> from firedrake import COMM_SELF, COMM_WORLD
>> from firedrake.petsc import PETSc
>> from mpi4py import MPI
>> from numpy.testing import assert_array_almost_equal
>> 
>> from utilities import (
>>     Print,
>>     create_petsc_matrix,
>> )
>> 
>> nproc = COMM_WORLD.size
>> rank = COMM_WORLD.rank
>> 
>> # --------------------------------------------
>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>> #  A' = Phi.T * A * Phi
>> # [k x k] <- [k x m] x [m x m] x [m x k]
>> # --------------------------------------------
>> 
>> m, k = 11, 7
>> # Generate the random numpy matrices
>> np.random.seed(0)  # sets the seed to 0
>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>> 
>> # Create A as an mpi matrix distributed on each process
>> A = create_petsc_matrix(A_np)
>> 
>> # Create Phi as an mpi matrix distributed on each process
>> Phi = create_petsc_matrix(Phi_np)
>> 
>> A1 = create_petsc_matrix(np.zeros((k, m)))
>> 
>> # Now A1 contains the result of Phi^T * A
>> Phi.transposeMatMult(A, A1)
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/7be1dccb/attachment-0001.html>

From thanasis.boutsikakis at corintis.com  Thu Oct  5 07:02:16 2023
From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis)
Date: Thu, 5 Oct 2023 14:02:16 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
Message-ID: <B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>

Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth?

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0

"""Experimenting with PETSc mat-mat multiplication"""

import time

import numpy as np
from colorama import Fore
from firedrake import COMM_SELF, COMM_WORLD
from firedrake.petsc import PETSc
from mpi4py import MPI
from numpy.testing import assert_array_almost_equal

from utilities import (
    Print,
    create_petsc_matrix,
    print_matrix_partitioning,
)

nproc = COMM_WORLD.size
rank = COMM_WORLD.rank

# --------------------------------------------
# EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
#  A' = Phi.T * A * Phi
# [k x k] <- [k x m] x [m x m] x [m x k]
# --------------------------------------------

m, k = 11, 7
# Generate the random numpy matrices
np.random.seed(0)  # sets the seed to 0
A_np = np.random.randint(low=0, high=6, size=(m, m))
Phi_np = np.random.randint(low=0, high=6, size=(m, k))

# --------------------------------------------
# TEST: Galerking projection of numpy matrices A_np and Phi_np
# --------------------------------------------
Aprime_np = Phi_np.T @ A_np @ Phi_np
Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
Print(f"{Aprime_np}")

# Create A as an mpi matrix distributed on each process
A = create_petsc_matrix(A_np, sparse=False)

# Create Phi as an mpi matrix distributed on each process
Phi = create_petsc_matrix(Phi_np, sparse=False)

# Create an empty PETSc matrix object to store the result of the PtAP operation.
# This will hold the result A' = Phi.T * A * Phi after the computation.
A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)

# Perform the PtAP (Phi Transpose times A times Phi) operation.
# In mathematical terms, this operation is A' = Phi.T * A * Phi.
# A_prime will store the result of the operation.
Phi.PtAP(A, A_prime)

> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et> wrote:
> 
> How about using ptap which will use MatPtAP?
> It will be more efficient (and it will help you bypass the issue).
> 
> Thanks,
> Pierre
> 
>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>> 
>> Sorry, forgot function create_petsc_matrix()
>> 
>> def create_petsc_matrix(input_array sparse=True):
>>     """Create a PETSc matrix from an input_array
>> 
>>     Args:
>>         input_array (np array): Input array
>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>> 
>>     Returns:
>>         PETSc mat: PETSc matrix
>>     """
>>     # Check if input_array is 1D and reshape if necessary
>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>     global_rows, global_cols = input_array.shape
>> 
>>     size = ((None, global_rows), (global_cols, global_cols))
>> 
>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>     if sparse:
>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>     else:
>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>     matrix.setUp()
>> 
>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>> 
>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>         # Calculate the correct row in the array for the current process
>>         row_in_array = counter + local_rows_start
>>         matrix.setValues(
>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>         )
>> 
>>     # Assembly the matrix to compute the final structure
>>     matrix.assemblyBegin()
>>     matrix.assemblyEnd()
>> 
>>     return matrix
>> 
>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>> 
>>> Hi everyone,
>>> 
>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
>>> 
>>>     Phi.transposeMatMult(A, A1)
>>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
>>> petsc4py.PETSc.Error: error code 56
>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>>> [0] No support for this operation for this object type
>>> [0] Call MatProductCreate() first
>>> 
>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
>>> 
>>> """Experimenting with PETSc mat-mat multiplication"""
>>> 
>>> import time
>>> 
>>> import numpy as np
>>> from colorama import Fore
>>> from firedrake import COMM_SELF, COMM_WORLD
>>> from firedrake.petsc import PETSc
>>> from mpi4py import MPI
>>> from numpy.testing import assert_array_almost_equal
>>> 
>>> from utilities import (
>>>     Print,
>>>     create_petsc_matrix,
>>> )
>>> 
>>> nproc = COMM_WORLD.size
>>> rank = COMM_WORLD.rank
>>> 
>>> # --------------------------------------------
>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>> #  A' = Phi.T * A * Phi
>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>> # --------------------------------------------
>>> 
>>> m, k = 11, 7
>>> # Generate the random numpy matrices
>>> np.random.seed(0)  # sets the seed to 0
>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>> 
>>> # Create A as an mpi matrix distributed on each process
>>> A = create_petsc_matrix(A_np)
>>> 
>>> # Create Phi as an mpi matrix distributed on each process
>>> Phi = create_petsc_matrix(Phi_np)
>>> 
>>> A1 = create_petsc_matrix(np.zeros((k, m)))
>>> 
>>> # Now A1 contains the result of Phi^T * A
>>> Phi.transposeMatMult(A, A1)
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/b2819fd3/attachment.html>

From pierre at joliv.et  Thu Oct  5 07:17:52 2023
From: pierre at joliv.et (Pierre Jolivet)
Date: Thu, 5 Oct 2023 14:17:52 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
	<B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
Message-ID: <FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>

Not a petsc4py expert here, but you may to try instead:
A_prime = A.ptap(Phi)

Thanks,
Pierre

> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
> 
> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth?
> 
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
> [0]PETSC ERROR: to get more information on the crash.
> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> 
> """Experimenting with PETSc mat-mat multiplication"""
> 
> import time
> 
> import numpy as np
> from colorama import Fore
> from firedrake import COMM_SELF, COMM_WORLD
> from firedrake.petsc import PETSc
> from mpi4py import MPI
> from numpy.testing import assert_array_almost_equal
> 
> from utilities import (
>     Print,
>     create_petsc_matrix,
>     print_matrix_partitioning,
> )
> 
> nproc = COMM_WORLD.size
> rank = COMM_WORLD.rank
> 
> # --------------------------------------------
> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
> #  A' = Phi.T * A * Phi
> # [k x k] <- [k x m] x [m x m] x [m x k]
> # --------------------------------------------
> 
> m, k = 11, 7
> # Generate the random numpy matrices
> np.random.seed(0)  # sets the seed to 0
> A_np = np.random.randint(low=0, high=6, size=(m, m))
> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
> 
> # --------------------------------------------
> # TEST: Galerking projection of numpy matrices A_np and Phi_np
> # --------------------------------------------
> Aprime_np = Phi_np.T @ A_np @ Phi_np
> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
> Print(f"{Aprime_np}")
> 
> # Create A as an mpi matrix distributed on each process
> A = create_petsc_matrix(A_np, sparse=False)
> 
> # Create Phi as an mpi matrix distributed on each process
> Phi = create_petsc_matrix(Phi_np, sparse=False)
> 
> # Create an empty PETSc matrix object to store the result of the PtAP operation.
> # This will hold the result A' = Phi.T * A * Phi after the computation.
> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
> 
> # Perform the PtAP (Phi Transpose times A times Phi) operation.
> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
> # A_prime will store the result of the operation.
> Phi.PtAP(A, A_prime)
> 
>> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et> wrote:
>> 
>> How about using ptap which will use MatPtAP?
>> It will be more efficient (and it will help you bypass the issue).
>> 
>> Thanks,
>> Pierre
>> 
>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>> 
>>> Sorry, forgot function create_petsc_matrix()
>>> 
>>> def create_petsc_matrix(input_array sparse=True):
>>>     """Create a PETSc matrix from an input_array
>>> 
>>>     Args:
>>>         input_array (np array): Input array
>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>> 
>>>     Returns:
>>>         PETSc mat: PETSc matrix
>>>     """
>>>     # Check if input_array is 1D and reshape if necessary
>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>     global_rows, global_cols = input_array.shape
>>> 
>>>     size = ((None, global_rows), (global_cols, global_cols))
>>> 
>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>     if sparse:
>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>     else:
>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>     matrix.setUp()
>>> 
>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>> 
>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>         # Calculate the correct row in the array for the current process
>>>         row_in_array = counter + local_rows_start
>>>         matrix.setValues(
>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>         )
>>> 
>>>     # Assembly the matrix to compute the final structure
>>>     matrix.assemblyBegin()
>>>     matrix.assemblyEnd()
>>> 
>>>     return matrix
>>> 
>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>>> 
>>>> Hi everyone,
>>>> 
>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
>>>> 
>>>>     Phi.transposeMatMult(A, A1)
>>>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
>>>> petsc4py.PETSc.Error: error code 56
>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>>>> [0] No support for this operation for this object type
>>>> [0] Call MatProductCreate() first
>>>> 
>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
>>>> 
>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>> 
>>>> import time
>>>> 
>>>> import numpy as np
>>>> from colorama import Fore
>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>> from firedrake.petsc import PETSc
>>>> from mpi4py import MPI
>>>> from numpy.testing import assert_array_almost_equal
>>>> 
>>>> from utilities import (
>>>>     Print,
>>>>     create_petsc_matrix,
>>>> )
>>>> 
>>>> nproc = COMM_WORLD.size
>>>> rank = COMM_WORLD.rank
>>>> 
>>>> # --------------------------------------------
>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>> #  A' = Phi.T * A * Phi
>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>> # --------------------------------------------
>>>> 
>>>> m, k = 11, 7
>>>> # Generate the random numpy matrices
>>>> np.random.seed(0)  # sets the seed to 0
>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>> 
>>>> # Create A as an mpi matrix distributed on each process
>>>> A = create_petsc_matrix(A_np)
>>>> 
>>>> # Create Phi as an mpi matrix distributed on each process
>>>> Phi = create_petsc_matrix(Phi_np)
>>>> 
>>>> A1 = create_petsc_matrix(np.zeros((k, m)))
>>>> 
>>>> # Now A1 contains the result of Phi^T * A
>>>> Phi.transposeMatMult(A, A1)
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/6af6a2d3/attachment-0001.html>

From thanasis.boutsikakis at corintis.com  Thu Oct  5 07:23:20 2023
From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis)
Date: Thu, 5 Oct 2023 14:23:20 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
	<B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
	<FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>
Message-ID: <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com>

This works Pierre. Amazing input, thanks a lot!

> On 5 Oct 2023, at 14:17, Pierre Jolivet <pierre at joliv.et> wrote:
> 
> Not a petsc4py expert here, but you may to try instead:
> A_prime = A.ptap(Phi)
> 
> Thanks,
> Pierre
> 
>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>> 
>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth?
>> 
>> [0]PETSC ERROR: ------------------------------------------------------------------------
>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>> [0]PETSC ERROR: to get more information on the crash.
>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>> 
>> """Experimenting with PETSc mat-mat multiplication"""
>> 
>> import time
>> 
>> import numpy as np
>> from colorama import Fore
>> from firedrake import COMM_SELF, COMM_WORLD
>> from firedrake.petsc import PETSc
>> from mpi4py import MPI
>> from numpy.testing import assert_array_almost_equal
>> 
>> from utilities import (
>>     Print,
>>     create_petsc_matrix,
>>     print_matrix_partitioning,
>> )
>> 
>> nproc = COMM_WORLD.size
>> rank = COMM_WORLD.rank
>> 
>> # --------------------------------------------
>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>> #  A' = Phi.T * A * Phi
>> # [k x k] <- [k x m] x [m x m] x [m x k]
>> # --------------------------------------------
>> 
>> m, k = 11, 7
>> # Generate the random numpy matrices
>> np.random.seed(0)  # sets the seed to 0
>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>> 
>> # --------------------------------------------
>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>> # --------------------------------------------
>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>> Print(f"{Aprime_np}")
>> 
>> # Create A as an mpi matrix distributed on each process
>> A = create_petsc_matrix(A_np, sparse=False)
>> 
>> # Create Phi as an mpi matrix distributed on each process
>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>> 
>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>> 
>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>> # A_prime will store the result of the operation.
>> Phi.PtAP(A, A_prime)
>> 
>>> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et> wrote:
>>> 
>>> How about using ptap which will use MatPtAP?
>>> It will be more efficient (and it will help you bypass the issue).
>>> 
>>> Thanks,
>>> Pierre
>>> 
>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>>> 
>>>> Sorry, forgot function create_petsc_matrix()
>>>> 
>>>> def create_petsc_matrix(input_array sparse=True):
>>>>     """Create a PETSc matrix from an input_array
>>>> 
>>>>     Args:
>>>>         input_array (np array): Input array
>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>> 
>>>>     Returns:
>>>>         PETSc mat: PETSc matrix
>>>>     """
>>>>     # Check if input_array is 1D and reshape if necessary
>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>     global_rows, global_cols = input_array.shape
>>>> 
>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>> 
>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>     if sparse:
>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>     else:
>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>     matrix.setUp()
>>>> 
>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>> 
>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>         # Calculate the correct row in the array for the current process
>>>>         row_in_array = counter + local_rows_start
>>>>         matrix.setValues(
>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>         )
>>>> 
>>>>     # Assembly the matrix to compute the final structure
>>>>     matrix.assemblyBegin()
>>>>     matrix.assemblyEnd()
>>>> 
>>>>     return matrix
>>>> 
>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
>>>>> 
>>>>>     Phi.transposeMatMult(A, A1)
>>>>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
>>>>> petsc4py.PETSc.Error: error code 56
>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>>>>> [0] No support for this operation for this object type
>>>>> [0] Call MatProductCreate() first
>>>>> 
>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
>>>>> 
>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>> 
>>>>> import time
>>>>> 
>>>>> import numpy as np
>>>>> from colorama import Fore
>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>> from firedrake.petsc import PETSc
>>>>> from mpi4py import MPI
>>>>> from numpy.testing import assert_array_almost_equal
>>>>> 
>>>>> from utilities import (
>>>>>     Print,
>>>>>     create_petsc_matrix,
>>>>> )
>>>>> 
>>>>> nproc = COMM_WORLD.size
>>>>> rank = COMM_WORLD.rank
>>>>> 
>>>>> # --------------------------------------------
>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>> #  A' = Phi.T * A * Phi
>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>> # --------------------------------------------
>>>>> 
>>>>> m, k = 11, 7
>>>>> # Generate the random numpy matrices
>>>>> np.random.seed(0)  # sets the seed to 0
>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>> 
>>>>> # Create A as an mpi matrix distributed on each process
>>>>> A = create_petsc_matrix(A_np)
>>>>> 
>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>> Phi = create_petsc_matrix(Phi_np)
>>>>> 
>>>>> A1 = create_petsc_matrix(np.zeros((k, m)))
>>>>> 
>>>>> # Now A1 contains the result of Phi^T * A
>>>>> Phi.transposeMatMult(A, A1)
>>>>> 
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/cc03dd8c/attachment-0001.html>

From junchao.zhang at gmail.com  Thu Oct  5 10:58:50 2023
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Thu, 5 Oct 2023 10:58:50 -0500
Subject: [petsc-users] Unexpected performance losses switching to COO
 interface
In-Reply-To: <CA+MQGp-NBG=CzGEZ-8PVCa3sUiG2erRCKUJG+rP-3cLcN+cVFg@mail.gmail.com>
References: <SA1PR09MB80772E375B8C0C24CCE0BD85C6C5A@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp-NBG=CzGEZ-8PVCa3sUiG2erRCKUJG+rP-3cLcN+cVFg@mail.gmail.com>
Message-ID: <CA+MQGp-tCQhjpVxjqF-zajKGsZNHZFH3v-S-RSdG=ciChu0Oow@mail.gmail.com>

Hi, Philip,
  I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues()
instead of the COO interface?  MatSetValues() needs to copy the data from
device to host and thus is expensive.
  Do you have profiling results with COO enabled?

[image: Screenshot 2023-10-05 at 10.55.29?AM.png]


--Junchao Zhang


On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Hi, Philip,
>   I will look into the tarballs and get back to you.
>    Thanks.
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> We finally have xolotl ported to use the new COO interface and the
>> aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port
>> to our previous version (using MatSetValuesStencil and the default Mat and
>> Vec implementations), we expected to see an improvement in performance for
>> both the "serial" and "cuda" builds (here I'm referring to the kokkos
>> configuration).
>>
>> Attached are two plots that show timings for three different cases. All
>> of these were run on Ascent (the Summit-like training system) with 6 MPI
>> tasks (on a single node). The CUDA cases were given one GPU per task (and
>> used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all
>> cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as
>> consistent as possible.
>>
>> The performance of RHSJacobian (where the bulk of computation happens in
>> xolotl) behaved basically as expected (better than expected in the serial
>> build). NE_3 case in CUDA was the only one that performed worse, but not
>> surprisingly, since its workload for the GPUs is much smaller. We've still
>> got more optimization to do on this.
>>
>> The real surprise was how much worse the overall solve times were. This
>> seems to be due simply to switching to the kokkos-based implementation. I'm
>> wondering if there are any changes we can make in configuration or runtime
>> arguments to help with PETSc's performance here. Any help looking into this
>> would be appreciated.
>>
>> The tarballs linked here
>> <https://drive.google.com/file/d/19X_L3SVkGBM9YUzXnRR_kVWFG0JFwqZ3/view?usp=drive_link>
>> and here
>> <https://drive.google.com/file/d/15yDBN7-YlO1g6RJNPYNImzr611i1Ffhv/view?usp=drive_link>
>> are profiling databases which, once extracted, can be viewed with
>> hpcviewer. I don't know how helpful that will be, but hopefully it can give
>> you some direction.
>>
>> Thanks for your help,
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/9b0f2e03/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-10-05 at 10.55.29?AM.png
Type: image/png
Size: 144341 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/9b0f2e03/attachment-0001.png>

From srvenkat at utexas.edu  Thu Oct  5 12:57:00 2023
From: srvenkat at utexas.edu (Sreeram R Venkat)
Date: Thu, 5 Oct 2023 12:57:00 -0500
Subject: [petsc-users] Scattering a vector to/from a subset of processors
In-Reply-To: <CA+MQGp_iWB5ztpAHD8k7Rs0fKqBWUusRO79chyCqO_no=53-Bg@mail.gmail.com>
References: <CADtq7MsGCtU-DQcsRABth3ixZ21D9nYLtuUqRc3++yy3mZ-PFw@mail.gmail.com>
	<CA+MQGp_iWB5ztpAHD8k7Rs0fKqBWUusRO79chyCqO_no=53-Bg@mail.gmail.com>
Message-ID: <CADtq7MvNRSTMP4FxawHWT5WG7rr_zCFN4reJU_NtSvb1OsO56w@mail.gmail.com>

Thank you. This works for me.

Sreeram

On Wed, Oct 4, 2023 at 6:41?PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Hi, Sreeram,
> You can try this code. Since x, y are both MPI vectors, we just need to
> say we want to scatter x[0:N] to y[0:N]. The 12 index sets with your
> example on the 12 processes would be [0..8], [9..17], [18..26], [27..35],
> [], ..., [].  Actually, you can do it arbitrarily, say, with 12 index sets
> [0..17], [18..35], .., [].  PETSc will figure out how to do the
> communication.
>
> PetscInt rstart, rend, N;
> IS ix;
> VecScatter vscat;
> Vec y;
> MPI_Comm comm;
> VecType type;
>
> PetscObjectGetComm((PetscObject)x, &comm);
> VecGetType(x, &type);
> VecGetSize(x, &N);
> VecGetOwnershipRange(x, &rstart, &rend);
>
> VecCreate(comm, &y);
> VecSetSizes(y, PETSC_DECIDE, N);
> VecSetType(y, type);
>
> ISCreateStride(PetscObjectComm((PetscObject)x), rend - rstart, rstart, 1,
> &ix);
> VecScatterCreate(x, ix, y, ix, &vscat);
>
> --Junchao Zhang
>
>
> On Wed, Oct 4, 2023 at 6:03?PM Sreeram R Venkat <srvenkat at utexas.edu>
> wrote:
>
>> Suppose I am running on 12 processors, and I have a vector "v" of size 36
>> partitioned over the first 4. v still uses the PETSC_COMM_WORLD, so it has
>> a layout of (9, 9, 9, 9, 0, 0, ..., 0). Now, I would like to repartition it
>> over all 12 processors, so that the layout becomes (3, 3, 3, ..., 3). I've
>> been trying to use VecScatter to do this, but I'm not sure what IndexSets
>> to use for the sender and receiver.
>>
>> The result I am trying to achieve is this:
>>
>> Assume the vector is v = <0, 1, 2, ..., 35>
>>
>>      Start                                Finish
>> Proc | Entries                 Proc | Entries
>>     0   |  0,...,8                     0   | 0, 1, 2
>>     1   |  9,...,17                   1   | 3, 4, 5
>>     2   |  18,...,26                 2   | 6, 7, 8
>>     3   |  27,...,35                 3   | 9, 10, 11
>>     4   |  None                     4   | 12, 13, 14
>>     5   |  None                     5   | 15, 16, 17
>>     6   |  None                     6   | 18, 19, 20
>>     7   |  None                     7   | 21, 22, 23
>>     8   |  None                     8   | 24, 25, 26
>>     9   |  None                     9   | 27, 28, 29
>>     10   |  None                   10 | 30, 31, 32
>>     11   |  None                   11  | 33, 34, 35
>>
>> Appreciate any help you can provide on this.
>>
>> Thanks,
>> Sreeram
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/3b6ea7ca/attachment.html>

From andrsd at gmail.com  Thu Oct  5 15:59:58 2023
From: andrsd at gmail.com (David Andrs)
Date: Thu, 5 Oct 2023 14:59:58 -0600
Subject: [petsc-users] Compute integral using DMPlexComputeIntegralFEM
In-Reply-To: <CAMYG4G=g-6Sxd7H_qQOky6NC4obtoXufK+q7NVZOBpa_UYTAwA@mail.gmail.com>
References: <E06B1C46-D31E-4C74-9E13-946C2406E6CD@gmail.com>
	<CAMYG4G=g-6Sxd7H_qQOky6NC4obtoXufK+q7NVZOBpa_UYTAwA@mail.gmail.com>
Message-ID: <CAOhzS2=JD+cATi39ijC9Qbt98YNBHTXAiuG4A+8AoH3=GeaAVg@mail.gmail.com>

Hi Matt!

Thanks for getting back to this.  I found a mistake in my hand calculation
- I used wrong shape functions (pretty stupid mistake :face_palm:).

Thanks again for your help,

David

On Wed, Oct 4, 2023 at 10:13?AM Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Sep 8, 2023 at 6:26?PM David Andrs <andrsd at gmail.com> wrote:
>
>> Hi all!
>>
>> I am trying to use DMPlexComputeIntegralFEM to compute an integral
>> $\int_\Omega u d\Omega$. My domain is a square (-1, 1)^2 (2x2 QUAD4
>> elements), I add first order Lagrange FE field on it, set the solution
>> vector (computed by a previous simulation).
>>
>> The value I am seeing computed by PETSc is -4, but the hand-calculated
>> value of this integral is -4.6. I also checked this in paraview using the
>> ?Integrate Variables? filter and it also returns -4.6 (this was to double
>> check that my hand-calculated value is correct).
>>
>
> Sorry it took so long. You caught me at a bad time.
>
> Something must be wrong with your analytic integrals. Here is me doing
> them by hand. You have a 3x3 vertex arrangement with coefficients
>
>    1  0 -3
>   -2 -1 -2
>   -3  0  1
>
> From the symmetry, the integrals of the cells along each diagonal must be
> equal. Now, the shape functions for Q_1 are
>
>      (1 - x) y            x y
>
>   (1 - x)(1 - y)     x (1 - y)
>
> Thus the integral for the lower left cell is
>
>   \int^1_0 dx \int^1_0 dy -3 + 3 x + y - 2 xy = -3 + 3/2 + 1/2 - 2/4 = -1.5
>
> which is also the upper right cell. The integral for the lower right cell
> is
>
>   \int^1_0 dx \int^1_0 dy x - y - 2 xy = 1/2 - 1/2 - 2/4 = -1/2
>
> which is also the upper left cell. Thus we get -1.5 - 1.5 - 0.5 - 0.5 =
> -4, which is what Plex gets.
>
>   THanks,
>
>     Matt
>
> So, I must be missing something obvious in my code. Attached is the
>> minimal PETSc code to show what I am doing. This is against PETSc 3.19.4.
>>
>> Thanks in advance for your help,
>>
>> David
>>
>> --
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/9694cce5/attachment.html>

From junchao.zhang at gmail.com  Thu Oct  5 16:29:30 2023
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Thu, 5 Oct 2023 16:29:30 -0500
Subject: [petsc-users] Unexpected performance losses switching to COO
 interface
In-Reply-To: <CA+MQGp-tCQhjpVxjqF-zajKGsZNHZFH3v-S-RSdG=ciChu0Oow@mail.gmail.com>
References: <SA1PR09MB80772E375B8C0C24CCE0BD85C6C5A@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp-NBG=CzGEZ-8PVCa3sUiG2erRCKUJG+rP-3cLcN+cVFg@mail.gmail.com>
	<CA+MQGp-tCQhjpVxjqF-zajKGsZNHZFH3v-S-RSdG=ciChu0Oow@mail.gmail.com>
Message-ID: <CA+MQGp-Srn8rxJsaozPYt+hOge5VJdMuJdwVHB=sxHyCS_KoAA@mail.gmail.com>

Wait a moment, it seems it was because we do not have a GPU implementation
of MatShift...
Let me see how to add it.
--Junchao Zhang


On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Hi, Philip,
>   I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues()
> instead of the COO interface?  MatSetValues() needs to copy the data from
> device to host and thus is expensive.
>   Do you have profiling results with COO enabled?
>
> [image: Screenshot 2023-10-05 at 10.55.29?AM.png]
>
>
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> Hi, Philip,
>>   I will look into the tarballs and get back to you.
>>    Thanks.
>> --Junchao Zhang
>>
>>
>> On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>>> We finally have xolotl ported to use the new COO interface and the
>>> aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port
>>> to our previous version (using MatSetValuesStencil and the default Mat and
>>> Vec implementations), we expected to see an improvement in performance for
>>> both the "serial" and "cuda" builds (here I'm referring to the kokkos
>>> configuration).
>>>
>>> Attached are two plots that show timings for three different cases. All
>>> of these were run on Ascent (the Summit-like training system) with 6 MPI
>>> tasks (on a single node). The CUDA cases were given one GPU per task (and
>>> used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all
>>> cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as
>>> consistent as possible.
>>>
>>> The performance of RHSJacobian (where the bulk of computation happens in
>>> xolotl) behaved basically as expected (better than expected in the serial
>>> build). NE_3 case in CUDA was the only one that performed worse, but not
>>> surprisingly, since its workload for the GPUs is much smaller. We've still
>>> got more optimization to do on this.
>>>
>>> The real surprise was how much worse the overall solve times were. This
>>> seems to be due simply to switching to the kokkos-based implementation. I'm
>>> wondering if there are any changes we can make in configuration or runtime
>>> arguments to help with PETSc's performance here. Any help looking into this
>>> would be appreciated.
>>>
>>> The tarballs linked here
>>> <https://drive.google.com/file/d/19X_L3SVkGBM9YUzXnRR_kVWFG0JFwqZ3/view?usp=drive_link>
>>> and here
>>> <https://drive.google.com/file/d/15yDBN7-YlO1g6RJNPYNImzr611i1Ffhv/view?usp=drive_link>
>>> are profiling databases which, once extracted, can be viewed with
>>> hpcviewer. I don't know how helpful that will be, but hopefully it can give
>>> you some direction.
>>>
>>> Thanks for your help,
>>>
>>>
>>> *Philip Fackler *
>>> Research Software Engineer, Application Engineering Group
>>> Advanced Computing Systems Research Section
>>> Computer Science and Mathematics Division
>>> *Oak Ridge National Laboratory*
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/abcbf75f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-10-05 at 10.55.29?AM.png
Type: image/png
Size: 144341 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/abcbf75f/attachment-0001.png>

From facklerpw at ornl.gov  Thu Oct  5 16:52:09 2023
From: facklerpw at ornl.gov (Fackler, Philip)
Date: Thu, 5 Oct 2023 21:52:09 +0000
Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses
 switching to COO interface
In-Reply-To: <CA+MQGp-Srn8rxJsaozPYt+hOge5VJdMuJdwVHB=sxHyCS_KoAA@mail.gmail.com>
References: <SA1PR09MB80772E375B8C0C24CCE0BD85C6C5A@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp-NBG=CzGEZ-8PVCa3sUiG2erRCKUJG+rP-3cLcN+cVFg@mail.gmail.com>
	<CA+MQGp-tCQhjpVxjqF-zajKGsZNHZFH3v-S-RSdG=ciChu0Oow@mail.gmail.com>
	<CA+MQGp-Srn8rxJsaozPYt+hOge5VJdMuJdwVHB=sxHyCS_KoAA@mail.gmail.com>
Message-ID: <SA1PR09MB807710D811420766DA7BBFADC6CAA@SA1PR09MB8077.namprd09.prod.outlook.com>

Aha! That makes sense. Thank you.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: Thursday, October 5, 2023 17:29
To: Fackler, Philip <facklerpw at ornl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>; Blondel, Sophie <sblondel at utk.edu>
Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface

Wait a moment, it seems it was because we do not have a GPU implementation of MatShift...
Let me see how to add it.
--Junchao Zhang


On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Hi, Philip,
  I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() instead of the COO interface?  MatSetValues() needs to copy the data from device to host and thus is expensive.
  Do you have profiling results with COO enabled?

[Screenshot 2023-10-05 at 10.55.29?AM.png]


--Junchao Zhang


On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Hi, Philip,
  I will look into the tarballs and get back to you.
   Thanks.
--Junchao Zhang


On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
We finally have xolotl ported to use the new COO interface and the aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port to our previous version (using MatSetValuesStencil and the default Mat and Vec implementations), we expected to see an improvement in performance for both the "serial" and "cuda" builds (here I'm referring to the kokkos configuration).

Attached are two plots that show timings for three different cases. All of these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on a single node). The CUDA cases were given one GPU per task (and used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible.

The performance of RHSJacobian (where the bulk of computation happens in xolotl) behaved basically as expected (better than expected in the serial build). NE_3 case in CUDA was the only one that performed worse, but not surprisingly, since its workload for the GPUs is much smaller. We've still got more optimization to do on this.

The real surprise was how much worse the overall solve times were. This seems to be due simply to switching to the kokkos-based implementation. I'm wondering if there are any changes we can make in configuration or runtime arguments to help with PETSc's performance here. Any help looking into this would be appreciated.

The tarballs linked here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA&e=> and here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk&e=> are profiling databases which, once extracted, can be viewed with hpcviewer. I don't know how helpful that will be, but hopefully it can give you some direction.

Thanks for your help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/44439194/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-10-05 at 10.55.29?AM.png
Type: image/png
Size: 144341 bytes
Desc: Screenshot 2023-10-05 at 10.55.29?AM.png
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/44439194/attachment-0001.png>

From kenneth.c.hall at duke.edu  Thu Oct  5 17:37:04 2023
From: kenneth.c.hall at duke.edu (Kenneth C Hall)
Date: Thu, 5 Oct 2023 22:37:04 +0000
Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda)
Message-ID: <BL0PR05MB480177D5E10088FE9EC11DA4A2CAA@BL0PR05MB4801.namprd05.prod.outlook.com>

Hi all,

I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x.

I am trying to implement in SLEPc/NEP.  To get started, I have defined a much smaller and simpler system of the form
A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP.

I have set the problem up using shell matrices (as that is my ultimate goal).  The full code is attached, but here is a smaller snippet of code:


!.... Create matrix-free operators for A and B

      PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr))

      PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr))

      PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr))

      PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr))


!.... Create nonlinear eigensolver

      PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr))


!.... Set the problem type

      PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr))

!

!.... set the solver type

      PetscCall(NEPSetType(nep, NEPNLEIGS, ierr))

!

!.... Set functions and Jacobians for NEP

      PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr))

      PetscCall(NEPSetJacobian(nep, B,    MyNEPJacobian, PETSC_NULL_INTEGER, ierr))

The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B.  The NEP viewer and error messages are attached.

Any help on getting this problem properly set up would be greatly appreciated.

Kenneth Hall
ATTACHMENTS:
test_nep.f90
code_output

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/26fd298f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: code_output
Type: application/octet-stream
Size: 3674 bytes
Desc: code_output
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/26fd298f/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_nep.f90
Type: application/octet-stream
Size: 7440 bytes
Desc: test_nep.f90
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/26fd298f/attachment-0003.obj>

From s.kramer at imperial.ac.uk  Thu Oct  5 18:22:48 2023
From: s.kramer at imperial.ac.uk (Stephan Kramer)
Date: Fri, 6 Oct 2023 10:22:48 +1100
Subject: [petsc-users] performance regression with GAMG
In-Reply-To: <CADOhEh4SJAd=G5Dk73hXCaG9T3fMYWrPtVxY99WSGaRnyixt6g@mail.gmail.com>
References: <da49b029-a648-7ae8-af8e-a11761b4cf73@imperial.ac.uk>
	<CADOhEh4FFGT0cXxsQpvWttkOt4Vfrtm=yBeqpAtAcCWa17V7qA@mail.gmail.com>
	<9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk>
	<CADOhEh72DsiA5CHKQ15ec6EKfW9a-ST7U_xEoZ2_LO2taNmEqA@mail.gmail.com>
	<CADOhEh4r7z2-hV4F8oSQ-JvsAp9+U_j6UkyakAjuCwLDdjCb_w@mail.gmail.com>
	<CADOhEh6SSjUQR2G_JivHe576KyNpea-=w9UPKpj5QVWhdPbaMg@mail.gmail.com>
	<99896e04-7ac2-9e92-0922-e78f2d0c710d@imperial.ac.uk>
	<CADOhEh7my6wfisDOiwOP3zniLZfkzJVvQG=dgfrSrBckbxw3Pw@mail.gmail.com>
	<0b512a75-d6ae-8a3f-1478-970b700c008a@imperial.ac.uk>
	<CADOhEh4SJAd=G5Dk73hXCaG9T3fMYWrPtVxY99WSGaRnyixt6g@mail.gmail.com>
Message-ID: <0aec9ffa-ccc1-9481-47c7-c32e69903f45@imperial.ac.uk>

Great, that seems to fix the issue indeed - i.e. on the branch with the 
low memory filtering switched off (by default) we no longer see the 
"inconsistent data" error or hangs, and going back to the square graph 
aggressive coarsening brings us back the old performance. So we'd be 
keen to have that branch merged indeed
Many thanks for your assistance with this
Stephan

On 05/10/2023 01:11, Mark Adams wrote:
> Thanks Stephan,
>
> It looks like the matrix is in a bad/incorrect state and parallel Mat-Mat
> is waiting for messages that were not sent. A bug.
>
> Can you try my branch, which is ready to merge, adams/gamg-fast-filter.
> We added a new filtering method in main that uses low memory but I found it
> was slow, so this branch brings back the old filter code, used by default,
> and keeps the low memory version as an option.
> It is possible this low memory filtering messed up the internals of the Mat
> in some way.
> I hope this is it, but if not we can continue.
>
> This MR also makes square graph the default.
> I have found it does create better aggregates and on GPUs, with Kokkos bug
> fixes from Junchao, Mat-Mat is fast. (it might be slow on CPUs)
>
> Mark
>
>
>
>
> On Wed, Oct 4, 2023 at 12:30?AM Stephan Kramer <s.kramer at imperial.ac.uk>
> wrote:
>
>> Hi Mark
>>
>> Thanks again for re-enabling the square graph aggressive coarsening
>> option which seems to have restored performance for most of our cases.
>> Unfortunately we do have a remaining issue, which only seems to occur
>> for the larger mesh size ("level 7" which has 6,389,890 vertices and we
>> normally run on 1536 cpus): we either get a "Petsc has generated
>> inconsistent data" error, or a hang - both when constructing the square
>> graph matrix. So this is with the new
>> -pc_gamg_aggressive_square_graph=true option, without the option there's
>> no error but of course we would get back to the worse performance.
>>
>> Backtrace for the "inconsistent data" error. Note this is actually just
>> petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge
>> of adams/gamg-add-old-coarsening into main - with one unrelated commit
>> from firedrake
>>
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Petsc has generated inconsistent data
>> [0]PETSC ERROR: j 8 not equal to expected number of sends 9
>> [0]PETSC ERROR: Petsc Development GIT revision:
>> v3.4.2-43104-ga3b76b71a1  GIT Date: 2023-09-18 10:26:04 +0100
>> [0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a  named
>> gadi-cpu-clx-0241.gadi.nci.org.au by sck551 Wed Oct  4 14:30:41 2023
>> [0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix
>> --with-make-np=4 --with-debugging=0 --with-shared-libraries=1
>> --with-fortran-bindings=0 --with-zlib --with-c2html=0
>> --with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx
>> --with-fc=mpifort --download-hdf5 --download-hypre
>> --download-superlu_dist --download-ptscotch --download-suitesparse
>> --download-pastix --download-hwloc --download-metis --download-scalapack
>> --download-mumps --download-chaco --download-ml
>> CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441
>> [0]PETSC ERROR: #1 PetscGatherMessageLengths2() at
>> /jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270
>> [0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at
>> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867
>> [0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at
>> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
>> [0]PETSC ERROR: #4 MatProductSymbolic() at
>> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
>> [0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
>> [0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
>> [0]PETSC ERROR: #7 PCSetUp_GAMG() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
>> [0]PETSC ERROR: #8 PCSetUp() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
>> [0]PETSC ERROR: #9 PCApply() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
>> [0]PETSC ERROR: #10 PCApply() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
>> [0]PETSC ERROR: #11 KSP_PCApply() at
>> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
>> [0]PETSC ERROR: #12 KSPSolve_CG() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162
>> [0]PETSC ERROR: #13 KSPSolve_Private() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
>> [0]PETSC ERROR: #14 KSPSolve() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
>> [0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at
>>
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175
>> [0]PETSC ERROR: #16 PCApply() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
>> [0]PETSC ERROR: #17 KSP_PCApply() at
>> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
>> [0]PETSC ERROR: #18 KSPSolve_PREONLY() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25
>> [0]PETSC ERROR: #19 KSPSolve_Private() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
>> [0]PETSC ERROR: #20 KSPSolve() at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
>> [0]PETSC ERROR: #21 SNESSolve_KSPONLY() at
>> /jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49
>> [0]PETSC ERROR: #22 SNESSolve() at
>> /jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635
>>
>> Last -info :pc messages:
>>
>> [0] <pc:gamg> PCSetUp(): Setting up PC for first time
>> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0)
>> N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536
>> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in
>> graph (1.588710e+07 1.765233e+06)
>> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:
>> Square Graph on level 1
>> [0] <pc:gamg> fixAggregatesWithSquare(): isMPI = yes
>> [0] <pc:gamg> PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_:
>> New grid 380144 nodes
>> [0] <pc:gamg> PCGAMGOptProlongator_AGG():
>> Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00
>> min=9.015236e-02 PC=jacobi
>> [0] <pc:gamg> PCGAMGOptProlongator_AGG():
>> Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra
>> 0.0901524 4.48938
>> [0] <pc:gamg> PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_:
>> Coarse grid reduction from 1536 to 1536 active processes
>> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1)
>> N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes
>> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges in
>> graph (5.310360e+05 5.353000e+03)
>> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:
>> Square Graph on level 2
>>
>> The hang (on a slightly different model configuration but on the same
>> mesh and n/o cores) seems to occur in the same location. If I use gdb to
>> attach to the running processes, it seems on some cores it has somehow
>> manages to fall out of the pcsetup and is waiting in the first norm
>> calculation in the outside CG iteration:
>>
>> #0  0x000014cce9999119 in
>> hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from
>> /apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so
>> #1  0x000014ccef2c2737 in _coll_ml_allreduce () from
>> /apps/hcoll/4.7.3202/lib/libhcoll.so.1
>> #2  0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1,
>> rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 <ompi_mpi_double>,
>> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0, module=0x43a0110)
>> at
>>
>> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228
>> #3  0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1,
>> recvbuf=<optimized out>, count=1, datatype=<optimized out>,
>> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0) at pallreduce.c:113
>> #4  0x000014cd271c9889 in VecNorm_MPI_Default (xin=<optimized out>,
>> type=<optimized out>, z=<optimized out>, VecNorm_SeqFn=<optimized out>)
>> at
>>
>> /jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168
>> #5  VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at
>> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39
>> #6  0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648,
>> val=0x22d) at
>> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214
>> #7  0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163
>> etc.
>>
>> but with other cores still stuck at:
>>
>> #0  0x000015375cf41e8a in ucp_worker_progress () from
>> /apps/ucx/1.12.0/lib/libucp.so.0
>> #1  0x000015377d4bd57b in opal_progress () at
>>
>> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231
>> #2  0x000015377d4c3ba5 in ompi_sync_wait_mt
>> (sync=sync at entry=0x7ffd6aedf6f0) at
>>
>> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85
>> #3  0x000015378bf7cf38 in ompi_request_default_wait_any (count=8,
>> requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at
>>
>> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124
>> #4  0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0,
>> indx=0x7ffd6aedfa60, status=<optimized out>) at pwaitany.c:86
>> #5  0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ
>> (P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at
>> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884
>> #6  0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ
>> (C=0x2cc7500) at
>> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
>> #7  0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at
>> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
>> #8  0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500,
>> Gmat1=0x1, Gmat2=0xc0fe132c) at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
>> #9  0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500,
>> a_Gmat1=0x1, agg_lists=0xc0fe132c) at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
>> #10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
>> #11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
>> #12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
>> #13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply
>> (__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524,
>> __pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082
>> #14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS
>> (func=0x15378f302890, args=0x83b3218, nargsf=<optimized out>,
>> kwnames=<optimized out>) at ../Objects/descrobject.c:405
>> #15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0,
>> nargsf=<optimized out>, args=0x83b3218, callable=0x15378f302890,
>> tstate=0x23e0020) at ../Include/cpython/abstract.h:114
>> #16 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>,
>> args=0x83b3218, callable=0x15378f302890) at
>> ../Include/cpython/abstract.h:123
>> #17 call_function (kwnames=0x0, oparg=<optimized out>,
>> pp_stack=<synthetic pointer>, trace_info=0x7ffd6aee0390,
>> tstate=<optimized out>) at ../Python/ceval.c:5867
>> #18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>,
>> throwflag=<optimized out>) at ../Python/ceval.c:4198
>> #19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080,
>> tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46
>> #20 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>,
>> locals=<optimized out>, args=<optimized out>, argcount=4,
>> kwnames=<optimized out>) at ../Python/ceval.c:5065
>> #21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func=<optimized
>> out>, args=0x1, _nargs=<optimized out>, kwargs=<optimized out>) at
>> src/petsc4py/PETSc.c:548022
>> #22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500,
>> __pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979
>> #23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
>> #24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1,
>> y=0xc0fe132c) at
>> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
>> #25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at
>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162
>>
>> Let me know if there is anything further we can try to debug this issue
>>
>> Kind regards
>> Stephan Kramer
>>
>>
>> On 02/09/2023 01:58, Mark Adams wrote:
>>> Fantastic!
>>>
>>> I fixed a memory free problem. You should be OK now.
>>> I am pretty sure you are good but I would like to wait to get any
>> feedback
>>> from you.
>>> We should have a release at the end of the month and it would be nice to
>>> get this into it.
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>>> On Fri, Sep 1, 2023 at 7:07?AM Stephan Kramer <s.kramer at imperial.ac.uk>
>>> wrote:
>>>
>>>> Hi Mark
>>>>
>>>> Sorry took a while to report back. We have tried your branch but hit a
>>>> few issues, some of which we're not entirely sure are related.
>>>>
>>>> First switching off minimum degree ordering, and then switching to the
>>>> old version of aggressive coarsening, as you suggested, got us back to
>>>> the coarsening behaviour that we had previously, but then we also
>>>> observed an even further worsening of the iteration count: it had
>>>> previously gone up by 50% already (with the newer main petsc), but now
>>>> was more than double "old" petsc. Took us a while to realize this was
>>>> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi.
>>>> Switching this also back to the old default we get back to very similar
>>>> coarsening levels (see below for more details if it is of interest) and
>>>> iteration counts.
>>>>
>>>> So that's all very good news. However, we were also starting seeing
>>>> memory errors (double free or corruption) when we switched off the
>>>> minimum degree ordering. Because this was at an earlier version of your
>>>> branch we then rebuild, hoping this was just an earlier bug that had
>>>> been fixed, but then we were having MPI-lockup issues. We have now
>>>> figured out the MPI issues are completely unrelated - some combination
>>>> with a newer mpi build and firedrake on our cluster which also occur
>>>> using main branches of everything. So switching back to an older MPI
>>>> build we are hoping to now test your most recent version of
>>>> adams/gamg-add-old-coarsening with these options and see whether the
>>>> memory errors are still there. Will let you know
>>>>
>>>> Best wishes
>>>> Stephan Kramer
>>>>
>>>> Coarsening details with various options for Level 6 of the test case:
>>>>
>>>> In our original setup (using "old" petsc), we had:
>>>>
>>>>              rows=516, cols=516, bs=6
>>>>              rows=12660, cols=12660, bs=6
>>>>              rows=346974, cols=346974, bs=6
>>>>              rows=19169670, cols=19169670, bs=3
>>>>
>>>> Then with the newer main petsc we had
>>>>
>>>>              rows=666, cols=666, bs=6
>>>>              rows=7740, cols=7740, bs=6
>>>>              rows=34902, cols=34902, bs=6
>>>>              rows=736578, cols=736578, bs=6
>>>>              rows=19169670, cols=19169670, bs=3
>>>>
>>>> Then on your branch with minimum_degree_ordering False:
>>>>
>>>>              rows=504, cols=504, bs=6
>>>>              rows=2274, cols=2274, bs=6
>>>>              rows=11010, cols=11010, bs=6
>>>>              rows=35790, cols=35790, bs=6
>>>>              rows=430686, cols=430686, bs=6
>>>>              rows=19169670, cols=19169670, bs=3
>>>>
>>>> And with minimum_degree_ordering False and use_aggressive_square_graph
>>>> True:
>>>>
>>>>              rows=498, cols=498, bs=6
>>>>              rows=12672, cols=12672, bs=6
>>>>              rows=346974, cols=346974, bs=6
>>>>              rows=19169670, cols=19169670, bs=3
>>>>
>>>> So that is indeed pretty much back to what it was before
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 31/08/2023 23:40, Mark Adams wrote:
>>>>> Hi Stephan,
>>>>>
>>>>> This branch is settling down.  adams/gamg-add-old-coarsening
>>>>> <
>> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening>
>>>>> I made the old, not minimum degree, ordering the default but kept the
>> new
>>>>> "aggressive" coarsening as the default, so I am hoping that just adding
>>>>> "-pc_gamg_use_aggressive_square_graph true" to your regression tests
>> will
>>>>> get you back to where you were before.
>>>>> Fingers crossed ... let me know if you have any success or not.
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>
>>>>> On Tue, Aug 15, 2023 at 1:45?PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> Hi Stephan,
>>>>>>
>>>>>> I have a branch that you can try: adams/gamg-add-old-coarsening
>>>>>> <
>> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening
>>>>>> Things to test:
>>>>>> * First, verify that nothing unintended changed by reproducing your
>> bad
>>>>>> results with this branch (the defaults are the same)
>>>>>> * Try not using the minimum degree ordering that I suggested
>>>>>> with: -pc_gamg_use_minimum_degree_ordering false
>>>>>>      -- I am eager to see if that is the main problem.
>>>>>> * Go back to what I think is the old method:
>>>>>> -pc_gamg_use_minimum_degree_ordering
>>>>>> false -pc_gamg_use_aggressive_square_graph true
>>>>>>
>>>>>> When we get back to where you were, I would like to try to get modern
>>>>>> stuff working.
>>>>>> I did add a -pc_gamg_aggressive_mis_k <2>
>>>>>> You could to another step of MIS coarsening with
>>>> -pc_gamg_aggressive_mis_k
>>>>>> 3
>>>>>>
>>>>>> Anyway, lots to look at but, alas, AMG does have a lot of parameters.
>>>>>>
>>>>>> Thanks,
>>>>>> Mark
>>>>>>
>>>>>> On Mon, Aug 14, 2023 at 4:26?PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>
>>>>>>> On Mon, Aug 14, 2023 at 11:03?AM Stephan Kramer <
>>>> s.kramer at imperial.ac.uk>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Many thanks for looking into this, Mark
>>>>>>>>> My 3D tests were not that different and I see you lowered the
>>>>>>>> threshold.
>>>>>>>>> Note, you can set the threshold to zero, but your test is running
>> so
>>>>>>>> much
>>>>>>>>> differently than mine there is something else going on.
>>>>>>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to
>> shoot
>>>>>>>> for
>>>>>>>>> in 3D.
>>>>>>>>>
>>>>>>>>> So it is not clear what the problem is.  Some questions:
>>>>>>>>>
>>>>>>>>> * do you have a picture of this mesh to show me?
>>>>>>>> It's just a standard hexahedral cubed sphere mesh with the
>> refinement
>>>>>>>> level giving the number of times each of the six sides have been
>>>>>>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to
>> 16
>>>>>>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x
>>>> 16 =
>>>>>>>> 98304  hexes. And everything doubles in all 3 dimensions (so 2^3)
>>>> going
>>>>>>>> to the next Level
>>>>>>>>
>>>>>>> I see, and I assume these are pretty stretched elements.
>>>>>>>
>>>>>>>
>>>>>>>>> * what do you mean by Q1-Q2 elements?
>>>>>>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for
>> velocity
>>>>>>>> and (tri)linear for pressure
>>>>>>>>
>>>>>>>> I guess you could argue we could/should just do good old geometric
>>>>>>>> multigrid instead. More generally we do use this solver
>> configuration
>>>> a
>>>>>>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our
>>>>>>>> adaptive mesh runs - would it be worth to see if we have the same
>>>>>>>> performance issues with tetrahedral P2-P1?
>>>>>>>>
>>>>>>> No, you have a clear reproducer, if not minimal.
>>>>>>> The first coarsening is very different.
>>>>>>>
>>>>>>> I am working on this and I see that I added a heuristic for thin
>> bodies
>>>>>>> where you order the vertices in greedy algorithms with minimum degree
>>>> first.
>>>>>>> This will tend to pick corners first, edges then faces, etc.
>>>>>>> That may be the problem. I would like to understand it better (see
>>>> below).
>>>>>>>
>>>>>>>>> It would be nice to see if the new and old codes are similar
>> without
>>>>>>>>> aggressive coarsening.
>>>>>>>>> This was the intended change of the major change in this time frame
>>>> as
>>>>>>>> you
>>>>>>>>> noticed.
>>>>>>>>> If these jobs are easy to run, could you check that the old and new
>>>>>>>>> versions are similar with "-pc_gamg_square_graph  0 ",  ( and you
>>>> only
>>>>>>>> need
>>>>>>>>> one time step).
>>>>>>>>> All you need to do is check that the first coarse grid has about
>> the
>>>>>>>> same
>>>>>>>>> number of equations (large).
>>>>>>>> Unfortunately we're seeing some memory errors when we use this
>> option,
>>>>>>>> and I'm not entirely clear whether we're just running out of memory
>>>> and
>>>>>>>> need to put it on a special queue.
>>>>>>>>
>>>>>>>> The run with square_graph 0 using new PETSc managed to get through
>> one
>>>>>>>> solve at level 5, and is giving the following mg levels:
>>>>>>>>
>>>>>>>>             rows=174, cols=174, bs=6
>>>>>>>>               total: nonzeros=30276, allocated nonzeros=30276
>>>>>>>> --
>>>>>>>>               rows=2106, cols=2106, bs=6
>>>>>>>>               total: nonzeros=4238532, allocated nonzeros=4238532
>>>>>>>> --
>>>>>>>>               rows=21828, cols=21828, bs=6
>>>>>>>>               total: nonzeros=62588232, allocated nonzeros=62588232
>>>>>>>> --
>>>>>>>>               rows=589824, cols=589824, bs=6
>>>>>>>>               total: nonzeros=1082528928, allocated
>> nonzeros=1082528928
>>>>>>>> --
>>>>>>>>               rows=2433222, cols=2433222, bs=3
>>>>>>>>               total: nonzeros=456526098, allocated nonzeros=456526098
>>>>>>>>
>>>>>>>> comparing with square_graph 100 with new PETSc
>>>>>>>>
>>>>>>>>               rows=96, cols=96, bs=6
>>>>>>>>               total: nonzeros=9216, allocated nonzeros=9216
>>>>>>>> --
>>>>>>>>               rows=1440, cols=1440, bs=6
>>>>>>>>               total: nonzeros=647856, allocated nonzeros=647856
>>>>>>>> --
>>>>>>>>               rows=97242, cols=97242, bs=6
>>>>>>>>               total: nonzeros=65656836, allocated nonzeros=65656836
>>>>>>>> --
>>>>>>>>               rows=2433222, cols=2433222, bs=3
>>>>>>>>               total: nonzeros=456526098, allocated nonzeros=456526098
>>>>>>>>
>>>>>>>> and old PETSc with square_graph 100
>>>>>>>>
>>>>>>>>               rows=90, cols=90, bs=6
>>>>>>>>               total: nonzeros=8100, allocated nonzeros=8100
>>>>>>>> --
>>>>>>>>               rows=1872, cols=1872, bs=6
>>>>>>>>               total: nonzeros=1234080, allocated nonzeros=1234080
>>>>>>>> --
>>>>>>>>               rows=47652, cols=47652, bs=6
>>>>>>>>               total: nonzeros=23343264, allocated nonzeros=23343264
>>>>>>>> --
>>>>>>>>               rows=2433222, cols=2433222, bs=3
>>>>>>>>               total: nonzeros=456526098, allocated nonzeros=456526098
>>>>>>>> --
>>>>>>>>
>>>>>>>> Unfortunately old PETSc with square_graph 0 did not complete a
>> single
>>>>>>>> solve before giving the memory error
>>>>>>>>
>>>>>>> OK, thanks for trying.
>>>>>>>
>>>>>>> I am working on this and I will give you a branch to test, but if you
>>>> can
>>>>>>> rebuild PETSc here is a quick test that might fix your problem.
>>>>>>> In src/ksp/pc/impls/gamg/agg.c you will see:
>>>>>>>
>>>>>>>        PetscCall(PetscSortIntWithArray(nloc, degree, permute));
>>>>>>>
>>>>>>> If you can comment this out in the new code and compare with the old,
>>>>>>> that might fix the problem.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>>>> BTW, I am starting to think I should add the old method back as an
>>>>>>>> option.
>>>>>>>>> I did not think this change would cause large differences.
>>>>>>>> Yes, I think that would be much appreciated. Let us know if we can
>> do
>>>>>>>> any testing
>>>>>>>>
>>>>>>>> Best wishes
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Note that we are providing the rigid body near nullspace,
>>>>>>>>>> hence the bs=3 to bs=6.
>>>>>>>>>> We have tried different values for the gamg_threshold but it
>> doesn't
>>>>>>>>>> really seem to significantly alter the coarsening amount in that
>>>> first
>>>>>>>>>> step.
>>>>>>>>>>
>>>>>>>>>> Do you have any suggestions for further things we should try/look
>>>> at?
>>>>>>>>>> Any feedback would be much appreciated
>>>>>>>>>>
>>>>>>>>>> Best wishes
>>>>>>>>>> Stephan Kramer
>>>>>>>>>>
>>>>>>>>>> Full logs including log_view timings available from
>>>>>>>>>> https://github.com/stephankramer/petsc-scaling/
>>>>>>>>>>
>>>>>>>>>> In particular:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat
>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat
>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat
>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat
>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat
>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat
>>


From mfadams at lbl.gov  Thu Oct  5 19:51:38 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Thu, 5 Oct 2023 20:51:38 -0400
Subject: [petsc-users] performance regression with GAMG
In-Reply-To: <0aec9ffa-ccc1-9481-47c7-c32e69903f45@imperial.ac.uk>
References: <da49b029-a648-7ae8-af8e-a11761b4cf73@imperial.ac.uk>
	<CADOhEh4FFGT0cXxsQpvWttkOt4Vfrtm=yBeqpAtAcCWa17V7qA@mail.gmail.com>
	<9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk>
	<CADOhEh72DsiA5CHKQ15ec6EKfW9a-ST7U_xEoZ2_LO2taNmEqA@mail.gmail.com>
	<CADOhEh4r7z2-hV4F8oSQ-JvsAp9+U_j6UkyakAjuCwLDdjCb_w@mail.gmail.com>
	<CADOhEh6SSjUQR2G_JivHe576KyNpea-=w9UPKpj5QVWhdPbaMg@mail.gmail.com>
	<99896e04-7ac2-9e92-0922-e78f2d0c710d@imperial.ac.uk>
	<CADOhEh7my6wfisDOiwOP3zniLZfkzJVvQG=dgfrSrBckbxw3Pw@mail.gmail.com>
	<0b512a75-d6ae-8a3f-1478-970b700c008a@imperial.ac.uk>
	<CADOhEh4SJAd=G5Dk73hXCaG9T3fMYWrPtVxY99WSGaRnyixt6g@mail.gmail.com>
	<0aec9ffa-ccc1-9481-47c7-c32e69903f45@imperial.ac.uk>
Message-ID: <CADOhEh7om+DJpNkos2+ehEsYXxjPuwXc+oJitqQMvhnkpFHogQ@mail.gmail.com>

Fantastic, it will get merged soon.

Thank you for your diligence and patience.
This would have been a time bomb waiting to explode.

Mark

On Thu, Oct 5, 2023 at 7:23?PM Stephan Kramer <s.kramer at imperial.ac.uk>
wrote:

> Great, that seems to fix the issue indeed - i.e. on the branch with the
> low memory filtering switched off (by default) we no longer see the
> "inconsistent data" error or hangs, and going back to the square graph
> aggressive coarsening brings us back the old performance. So we'd be
> keen to have that branch merged indeed
> Many thanks for your assistance with this
> Stephan
>
> On 05/10/2023 01:11, Mark Adams wrote:
> > Thanks Stephan,
> >
> > It looks like the matrix is in a bad/incorrect state and parallel Mat-Mat
> > is waiting for messages that were not sent. A bug.
> >
> > Can you try my branch, which is ready to merge, adams/gamg-fast-filter.
> > We added a new filtering method in main that uses low memory but I found
> it
> > was slow, so this branch brings back the old filter code, used by
> default,
> > and keeps the low memory version as an option.
> > It is possible this low memory filtering messed up the internals of the
> Mat
> > in some way.
> > I hope this is it, but if not we can continue.
> >
> > This MR also makes square graph the default.
> > I have found it does create better aggregates and on GPUs, with Kokkos
> bug
> > fixes from Junchao, Mat-Mat is fast. (it might be slow on CPUs)
> >
> > Mark
> >
> >
> >
> >
> > On Wed, Oct 4, 2023 at 12:30?AM Stephan Kramer <s.kramer at imperial.ac.uk>
> > wrote:
> >
> >> Hi Mark
> >>
> >> Thanks again for re-enabling the square graph aggressive coarsening
> >> option which seems to have restored performance for most of our cases.
> >> Unfortunately we do have a remaining issue, which only seems to occur
> >> for the larger mesh size ("level 7" which has 6,389,890 vertices and we
> >> normally run on 1536 cpus): we either get a "Petsc has generated
> >> inconsistent data" error, or a hang - both when constructing the square
> >> graph matrix. So this is with the new
> >> -pc_gamg_aggressive_square_graph=true option, without the option there's
> >> no error but of course we would get back to the worse performance.
> >>
> >> Backtrace for the "inconsistent data" error. Note this is actually just
> >> petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge
> >> of adams/gamg-add-old-coarsening into main - with one unrelated commit
> >> from firedrake
> >>
> >> [0]PETSC ERROR: --------------------- Error Message
> >> --------------------------------------------------------------
> >> [0]PETSC ERROR: Petsc has generated inconsistent data
> >> [0]PETSC ERROR: j 8 not equal to expected number of sends 9
> >> [0]PETSC ERROR: Petsc Development GIT revision:
> >> v3.4.2-43104-ga3b76b71a1  GIT Date: 2023-09-18 10:26:04 +0100
> >> [0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a  named
> >> gadi-cpu-clx-0241.gadi.nci.org.au by sck551 Wed Oct  4 14:30:41 2023
> >> [0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix
> >> --with-make-np=4 --with-debugging=0 --with-shared-libraries=1
> >> --with-fortran-bindings=0 --with-zlib --with-c2html=0
> >> --with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx
> >> --with-fc=mpifort --download-hdf5 --download-hypre
> >> --download-superlu_dist --download-ptscotch --download-suitesparse
> >> --download-pastix --download-hwloc --download-metis --download-scalapack
> >> --download-mumps --download-chaco --download-ml
> >> CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441
> >> [0]PETSC ERROR: #1 PetscGatherMessageLengths2() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270
> >> [0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at
> >>
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867
> >> [0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at
> >>
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
> >> [0]PETSC ERROR: #4 MatProductSymbolic() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
> >> [0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
> >> [0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
> >> [0]PETSC ERROR: #7 PCSetUp_GAMG() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
> >> [0]PETSC ERROR: #8 PCSetUp() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
> >> [0]PETSC ERROR: #9 PCApply() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
> >> [0]PETSC ERROR: #10 PCApply() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
> >> [0]PETSC ERROR: #11 KSP_PCApply() at
> >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
> >> [0]PETSC ERROR: #12 KSPSolve_CG() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162
> >> [0]PETSC ERROR: #13 KSPSolve_Private() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
> >> [0]PETSC ERROR: #14 KSPSolve() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
> >> [0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at
> >>
> >>
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175
> >> [0]PETSC ERROR: #16 PCApply() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
> >> [0]PETSC ERROR: #17 KSP_PCApply() at
> >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
> >> [0]PETSC ERROR: #18 KSPSolve_PREONLY() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25
> >> [0]PETSC ERROR: #19 KSPSolve_Private() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
> >> [0]PETSC ERROR: #20 KSPSolve() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
> >> [0]PETSC ERROR: #21 SNESSolve_KSPONLY() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49
> >> [0]PETSC ERROR: #22 SNESSolve() at
> >> /jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635
> >>
> >> Last -info :pc messages:
> >>
> >> [0] <pc:gamg> PCSetUp(): Setting up PC for first time
> >> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0)
> >> N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536
> >> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in
> >> graph (1.588710e+07 1.765233e+06)
> >> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:
> >> Square Graph on level 1
> >> [0] <pc:gamg> fixAggregatesWithSquare(): isMPI = yes
> >> [0] <pc:gamg> PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_:
> >> New grid 380144 nodes
> >> [0] <pc:gamg> PCGAMGOptProlongator_AGG():
> >> Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00
> >> min=9.015236e-02 PC=jacobi
> >> [0] <pc:gamg> PCGAMGOptProlongator_AGG():
> >> Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra
> >> 0.0901524 4.48938
> >> [0] <pc:gamg> PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_:
> >> Coarse grid reduction from 1536 to 1536 active processes
> >> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1)
> >> N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes
> >> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges in
> >> graph (5.310360e+05 5.353000e+03)
> >> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:
> >> Square Graph on level 2
> >>
> >> The hang (on a slightly different model configuration but on the same
> >> mesh and n/o cores) seems to occur in the same location. If I use gdb to
> >> attach to the running processes, it seems on some cores it has somehow
> >> manages to fall out of the pcsetup and is waiting in the first norm
> >> calculation in the outside CG iteration:
> >>
> >> #0  0x000014cce9999119 in
> >> hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from
> >> /apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so
> >> #1  0x000014ccef2c2737 in _coll_ml_allreduce () from
> >> /apps/hcoll/4.7.3202/lib/libhcoll.so.1
> >> #2  0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1,
> >> rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 <ompi_mpi_double>,
> >> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0, module=0x43a0110)
> >> at
> >>
> >>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228
> >> #3  0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1,
> >> recvbuf=<optimized out>, count=1, datatype=<optimized out>,
> >> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0) at pallreduce.c:113
> >> #4  0x000014cd271c9889 in VecNorm_MPI_Default (xin=<optimized out>,
> >> type=<optimized out>, z=<optimized out>, VecNorm_SeqFn=<optimized out>)
> >> at
> >>
> >>
> /jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168
> >> #5  VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at
> >> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39
> >> #6  0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648,
> >> val=0x22d) at
> >> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214
> >> #7  0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163
> >> etc.
> >>
> >> but with other cores still stuck at:
> >>
> >> #0  0x000015375cf41e8a in ucp_worker_progress () from
> >> /apps/ucx/1.12.0/lib/libucp.so.0
> >> #1  0x000015377d4bd57b in opal_progress () at
> >>
> >>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231
> >> #2  0x000015377d4c3ba5 in ompi_sync_wait_mt
> >> (sync=sync at entry=0x7ffd6aedf6f0) at
> >>
> >>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85
> >> #3  0x000015378bf7cf38 in ompi_request_default_wait_any (count=8,
> >> requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at
> >>
> >>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124
> >> #4  0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0,
> >> indx=0x7ffd6aedfa60, status=<optimized out>) at pwaitany.c:86
> >> #5  0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ
> >> (P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at
> >>
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884
> >> #6  0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ
> >> (C=0x2cc7500) at
> >>
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
> >> #7  0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at
> >> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
> >> #8  0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500,
> >> Gmat1=0x1, Gmat2=0xc0fe132c) at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
> >> #9  0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500,
> >> a_Gmat1=0x1, agg_lists=0xc0fe132c) at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
> >> #10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
> >> #11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
> >> #12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
> >> #13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply
> >> (__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524,
> >> __pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082
> >> #14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS
> >> (func=0x15378f302890, args=0x83b3218, nargsf=<optimized out>,
> >> kwnames=<optimized out>) at ../Objects/descrobject.c:405
> >> #15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0,
> >> nargsf=<optimized out>, args=0x83b3218, callable=0x15378f302890,
> >> tstate=0x23e0020) at ../Include/cpython/abstract.h:114
> >> #16 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>,
> >> args=0x83b3218, callable=0x15378f302890) at
> >> ../Include/cpython/abstract.h:123
> >> #17 call_function (kwnames=0x0, oparg=<optimized out>,
> >> pp_stack=<synthetic pointer>, trace_info=0x7ffd6aee0390,
> >> tstate=<optimized out>) at ../Python/ceval.c:5867
> >> #18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>,
> >> throwflag=<optimized out>) at ../Python/ceval.c:4198
> >> #19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080,
> >> tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46
> >> #20 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>,
> >> locals=<optimized out>, args=<optimized out>, argcount=4,
> >> kwnames=<optimized out>) at ../Python/ceval.c:5065
> >> #21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func=<optimized
> >> out>, args=0x1, _nargs=<optimized out>, kwargs=<optimized out>) at
> >> src/petsc4py/PETSc.c:548022
> >> #22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500,
> >> __pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979
> >> #23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
> >> #24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1,
> >> y=0xc0fe132c) at
> >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
> >> #25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at
> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162
> >>
> >> Let me know if there is anything further we can try to debug this issue
> >>
> >> Kind regards
> >> Stephan Kramer
> >>
> >>
> >> On 02/09/2023 01:58, Mark Adams wrote:
> >>> Fantastic!
> >>>
> >>> I fixed a memory free problem. You should be OK now.
> >>> I am pretty sure you are good but I would like to wait to get any
> >> feedback
> >>> from you.
> >>> We should have a release at the end of the month and it would be nice
> to
> >>> get this into it.
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>>
> >>> On Fri, Sep 1, 2023 at 7:07?AM Stephan Kramer <s.kramer at imperial.ac.uk
> >
> >>> wrote:
> >>>
> >>>> Hi Mark
> >>>>
> >>>> Sorry took a while to report back. We have tried your branch but hit a
> >>>> few issues, some of which we're not entirely sure are related.
> >>>>
> >>>> First switching off minimum degree ordering, and then switching to the
> >>>> old version of aggressive coarsening, as you suggested, got us back to
> >>>> the coarsening behaviour that we had previously, but then we also
> >>>> observed an even further worsening of the iteration count: it had
> >>>> previously gone up by 50% already (with the newer main petsc), but now
> >>>> was more than double "old" petsc. Took us a while to realize this was
> >>>> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi.
> >>>> Switching this also back to the old default we get back to very
> similar
> >>>> coarsening levels (see below for more details if it is of interest)
> and
> >>>> iteration counts.
> >>>>
> >>>> So that's all very good news. However, we were also starting seeing
> >>>> memory errors (double free or corruption) when we switched off the
> >>>> minimum degree ordering. Because this was at an earlier version of
> your
> >>>> branch we then rebuild, hoping this was just an earlier bug that had
> >>>> been fixed, but then we were having MPI-lockup issues. We have now
> >>>> figured out the MPI issues are completely unrelated - some combination
> >>>> with a newer mpi build and firedrake on our cluster which also occur
> >>>> using main branches of everything. So switching back to an older MPI
> >>>> build we are hoping to now test your most recent version of
> >>>> adams/gamg-add-old-coarsening with these options and see whether the
> >>>> memory errors are still there. Will let you know
> >>>>
> >>>> Best wishes
> >>>> Stephan Kramer
> >>>>
> >>>> Coarsening details with various options for Level 6 of the test case:
> >>>>
> >>>> In our original setup (using "old" petsc), we had:
> >>>>
> >>>>              rows=516, cols=516, bs=6
> >>>>              rows=12660, cols=12660, bs=6
> >>>>              rows=346974, cols=346974, bs=6
> >>>>              rows=19169670, cols=19169670, bs=3
> >>>>
> >>>> Then with the newer main petsc we had
> >>>>
> >>>>              rows=666, cols=666, bs=6
> >>>>              rows=7740, cols=7740, bs=6
> >>>>              rows=34902, cols=34902, bs=6
> >>>>              rows=736578, cols=736578, bs=6
> >>>>              rows=19169670, cols=19169670, bs=3
> >>>>
> >>>> Then on your branch with minimum_degree_ordering False:
> >>>>
> >>>>              rows=504, cols=504, bs=6
> >>>>              rows=2274, cols=2274, bs=6
> >>>>              rows=11010, cols=11010, bs=6
> >>>>              rows=35790, cols=35790, bs=6
> >>>>              rows=430686, cols=430686, bs=6
> >>>>              rows=19169670, cols=19169670, bs=3
> >>>>
> >>>> And with minimum_degree_ordering False and use_aggressive_square_graph
> >>>> True:
> >>>>
> >>>>              rows=498, cols=498, bs=6
> >>>>              rows=12672, cols=12672, bs=6
> >>>>              rows=346974, cols=346974, bs=6
> >>>>              rows=19169670, cols=19169670, bs=3
> >>>>
> >>>> So that is indeed pretty much back to what it was before
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 31/08/2023 23:40, Mark Adams wrote:
> >>>>> Hi Stephan,
> >>>>>
> >>>>> This branch is settling down.  adams/gamg-add-old-coarsening
> >>>>> <
> >> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening>
> >>>>> I made the old, not minimum degree, ordering the default but kept the
> >> new
> >>>>> "aggressive" coarsening as the default, so I am hoping that just
> adding
> >>>>> "-pc_gamg_use_aggressive_square_graph true" to your regression tests
> >> will
> >>>>> get you back to where you were before.
> >>>>> Fingers crossed ... let me know if you have any success or not.
> >>>>>
> >>>>> Thanks,
> >>>>> Mark
> >>>>>
> >>>>>
> >>>>> On Tue, Aug 15, 2023 at 1:45?PM Mark Adams <mfadams at lbl.gov> wrote:
> >>>>>
> >>>>>> Hi Stephan,
> >>>>>>
> >>>>>> I have a branch that you can try: adams/gamg-add-old-coarsening
> >>>>>> <
> >> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening
> >>>>>> Things to test:
> >>>>>> * First, verify that nothing unintended changed by reproducing your
> >> bad
> >>>>>> results with this branch (the defaults are the same)
> >>>>>> * Try not using the minimum degree ordering that I suggested
> >>>>>> with: -pc_gamg_use_minimum_degree_ordering false
> >>>>>>      -- I am eager to see if that is the main problem.
> >>>>>> * Go back to what I think is the old method:
> >>>>>> -pc_gamg_use_minimum_degree_ordering
> >>>>>> false -pc_gamg_use_aggressive_square_graph true
> >>>>>>
> >>>>>> When we get back to where you were, I would like to try to get
> modern
> >>>>>> stuff working.
> >>>>>> I did add a -pc_gamg_aggressive_mis_k <2>
> >>>>>> You could to another step of MIS coarsening with
> >>>> -pc_gamg_aggressive_mis_k
> >>>>>> 3
> >>>>>>
> >>>>>> Anyway, lots to look at but, alas, AMG does have a lot of
> parameters.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Mark
> >>>>>>
> >>>>>> On Mon, Aug 14, 2023 at 4:26?PM Mark Adams <mfadams at lbl.gov> wrote:
> >>>>>>
> >>>>>>> On Mon, Aug 14, 2023 at 11:03?AM Stephan Kramer <
> >>>> s.kramer at imperial.ac.uk>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Many thanks for looking into this, Mark
> >>>>>>>>> My 3D tests were not that different and I see you lowered the
> >>>>>>>> threshold.
> >>>>>>>>> Note, you can set the threshold to zero, but your test is running
> >> so
> >>>>>>>> much
> >>>>>>>>> differently than mine there is something else going on.
> >>>>>>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to
> >> shoot
> >>>>>>>> for
> >>>>>>>>> in 3D.
> >>>>>>>>>
> >>>>>>>>> So it is not clear what the problem is.  Some questions:
> >>>>>>>>>
> >>>>>>>>> * do you have a picture of this mesh to show me?
> >>>>>>>> It's just a standard hexahedral cubed sphere mesh with the
> >> refinement
> >>>>>>>> level giving the number of times each of the six sides have been
> >>>>>>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to
> >> 16
> >>>>>>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32
> x
> >>>> 16 =
> >>>>>>>> 98304  hexes. And everything doubles in all 3 dimensions (so 2^3)
> >>>> going
> >>>>>>>> to the next Level
> >>>>>>>>
> >>>>>>> I see, and I assume these are pretty stretched elements.
> >>>>>>>
> >>>>>>>
> >>>>>>>>> * what do you mean by Q1-Q2 elements?
> >>>>>>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for
> >> velocity
> >>>>>>>> and (tri)linear for pressure
> >>>>>>>>
> >>>>>>>> I guess you could argue we could/should just do good old geometric
> >>>>>>>> multigrid instead. More generally we do use this solver
> >> configuration
> >>>> a
> >>>>>>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our
> >>>>>>>> adaptive mesh runs - would it be worth to see if we have the same
> >>>>>>>> performance issues with tetrahedral P2-P1?
> >>>>>>>>
> >>>>>>> No, you have a clear reproducer, if not minimal.
> >>>>>>> The first coarsening is very different.
> >>>>>>>
> >>>>>>> I am working on this and I see that I added a heuristic for thin
> >> bodies
> >>>>>>> where you order the vertices in greedy algorithms with minimum
> degree
> >>>> first.
> >>>>>>> This will tend to pick corners first, edges then faces, etc.
> >>>>>>> That may be the problem. I would like to understand it better (see
> >>>> below).
> >>>>>>>
> >>>>>>>>> It would be nice to see if the new and old codes are similar
> >> without
> >>>>>>>>> aggressive coarsening.
> >>>>>>>>> This was the intended change of the major change in this time
> frame
> >>>> as
> >>>>>>>> you
> >>>>>>>>> noticed.
> >>>>>>>>> If these jobs are easy to run, could you check that the old and
> new
> >>>>>>>>> versions are similar with "-pc_gamg_square_graph  0 ",  ( and you
> >>>> only
> >>>>>>>> need
> >>>>>>>>> one time step).
> >>>>>>>>> All you need to do is check that the first coarse grid has about
> >> the
> >>>>>>>> same
> >>>>>>>>> number of equations (large).
> >>>>>>>> Unfortunately we're seeing some memory errors when we use this
> >> option,
> >>>>>>>> and I'm not entirely clear whether we're just running out of
> memory
> >>>> and
> >>>>>>>> need to put it on a special queue.
> >>>>>>>>
> >>>>>>>> The run with square_graph 0 using new PETSc managed to get through
> >> one
> >>>>>>>> solve at level 5, and is giving the following mg levels:
> >>>>>>>>
> >>>>>>>>             rows=174, cols=174, bs=6
> >>>>>>>>               total: nonzeros=30276, allocated nonzeros=30276
> >>>>>>>> --
> >>>>>>>>               rows=2106, cols=2106, bs=6
> >>>>>>>>               total: nonzeros=4238532, allocated nonzeros=4238532
> >>>>>>>> --
> >>>>>>>>               rows=21828, cols=21828, bs=6
> >>>>>>>>               total: nonzeros=62588232, allocated
> nonzeros=62588232
> >>>>>>>> --
> >>>>>>>>               rows=589824, cols=589824, bs=6
> >>>>>>>>               total: nonzeros=1082528928, allocated
> >> nonzeros=1082528928
> >>>>>>>> --
> >>>>>>>>               rows=2433222, cols=2433222, bs=3
> >>>>>>>>               total: nonzeros=456526098, allocated
> nonzeros=456526098
> >>>>>>>>
> >>>>>>>> comparing with square_graph 100 with new PETSc
> >>>>>>>>
> >>>>>>>>               rows=96, cols=96, bs=6
> >>>>>>>>               total: nonzeros=9216, allocated nonzeros=9216
> >>>>>>>> --
> >>>>>>>>               rows=1440, cols=1440, bs=6
> >>>>>>>>               total: nonzeros=647856, allocated nonzeros=647856
> >>>>>>>> --
> >>>>>>>>               rows=97242, cols=97242, bs=6
> >>>>>>>>               total: nonzeros=65656836, allocated
> nonzeros=65656836
> >>>>>>>> --
> >>>>>>>>               rows=2433222, cols=2433222, bs=3
> >>>>>>>>               total: nonzeros=456526098, allocated
> nonzeros=456526098
> >>>>>>>>
> >>>>>>>> and old PETSc with square_graph 100
> >>>>>>>>
> >>>>>>>>               rows=90, cols=90, bs=6
> >>>>>>>>               total: nonzeros=8100, allocated nonzeros=8100
> >>>>>>>> --
> >>>>>>>>               rows=1872, cols=1872, bs=6
> >>>>>>>>               total: nonzeros=1234080, allocated nonzeros=1234080
> >>>>>>>> --
> >>>>>>>>               rows=47652, cols=47652, bs=6
> >>>>>>>>               total: nonzeros=23343264, allocated
> nonzeros=23343264
> >>>>>>>> --
> >>>>>>>>               rows=2433222, cols=2433222, bs=3
> >>>>>>>>               total: nonzeros=456526098, allocated
> nonzeros=456526098
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> Unfortunately old PETSc with square_graph 0 did not complete a
> >> single
> >>>>>>>> solve before giving the memory error
> >>>>>>>>
> >>>>>>> OK, thanks for trying.
> >>>>>>>
> >>>>>>> I am working on this and I will give you a branch to test, but if
> you
> >>>> can
> >>>>>>> rebuild PETSc here is a quick test that might fix your problem.
> >>>>>>> In src/ksp/pc/impls/gamg/agg.c you will see:
> >>>>>>>
> >>>>>>>        PetscCall(PetscSortIntWithArray(nloc, degree, permute));
> >>>>>>>
> >>>>>>> If you can comment this out in the new code and compare with the
> old,
> >>>>>>> that might fix the problem.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Mark
> >>>>>>>
> >>>>>>>
> >>>>>>>>> BTW, I am starting to think I should add the old method back as
> an
> >>>>>>>> option.
> >>>>>>>>> I did not think this change would cause large differences.
> >>>>>>>> Yes, I think that would be much appreciated. Let us know if we can
> >> do
> >>>>>>>> any testing
> >>>>>>>>
> >>>>>>>> Best wishes
> >>>>>>>> Stephan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Mark
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Note that we are providing the rigid body near nullspace,
> >>>>>>>>>> hence the bs=3 to bs=6.
> >>>>>>>>>> We have tried different values for the gamg_threshold but it
> >> doesn't
> >>>>>>>>>> really seem to significantly alter the coarsening amount in that
> >>>> first
> >>>>>>>>>> step.
> >>>>>>>>>>
> >>>>>>>>>> Do you have any suggestions for further things we should
> try/look
> >>>> at?
> >>>>>>>>>> Any feedback would be much appreciated
> >>>>>>>>>>
> >>>>>>>>>> Best wishes
> >>>>>>>>>> Stephan Kramer
> >>>>>>>>>>
> >>>>>>>>>> Full logs including log_view timings available from
> >>>>>>>>>> https://github.com/stephankramer/petsc-scaling/
> >>>>>>>>>>
> >>>>>>>>>> In particular:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat
> >>
> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231005/5adaa767/attachment-0001.html>

From jroman at dsic.upv.es  Fri Oct  6 06:01:08 2023
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Fri, 6 Oct 2023 13:01:08 +0200
Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and
 T'(lambda)
In-Reply-To: <BL0PR05MB480177D5E10088FE9EC11DA4A2CAA@BL0PR05MB4801.namprd05.prod.outlook.com>
References: <BL0PR05MB480177D5E10088FE9EC11DA4A2CAA@BL0PR05MB4801.namprd05.prod.outlook.com>
Message-ID: <F1528F80-C92C-43A9-9CB7-75A1D8712100@dsic.upv.es>

I am getting an error in a different place than you. I started to debug, but don't have much time at the moment.
Can you try something? Comparing to ex21.c, I see that a difference that may be relevant is the MATOP_DUPLICATE operation. Can you try defining it for your A matrix?

Note: If you plan to use the NLEIGS solver, there is no need to define the derivative T' so you can skip the call to NEPSetJacobian().

Jose


> El 6 oct 2023, a las 0:37, Kenneth C Hall <kenneth.c.hall at duke.edu> escribi?:
> 
> Hi all,
>  
> I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x.
>  
> I am trying to implement in SLEPc/NEP.  To get started, I have defined a much smaller and simpler system of the form
> A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP.
>  
> I have set the problem up using shell matrices (as that is my ultimate goal).  The full code is attached, but here is a smaller snippet of code:
>  
> !.... Create matrix-free operators for A and B
>       PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr))
>       PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr))
>       PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr))
>       PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr))
>  
> !.... Create nonlinear eigensolver
>       PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr))
>  
> !.... Set the problem type
>       PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr))
> !
> !.... set the solver type
>       PetscCall(NEPSetType(nep, NEPNLEIGS, ierr))
> !
> !.... Set functions and Jacobians for NEP
>       PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr))
>       PetscCall(NEPSetJacobian(nep, B,    MyNEPJacobian, PETSC_NULL_INTEGER, ierr))
>  
> The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B.  The NEP viewer and error messages are attached.
>  
> Any help on getting this problem properly set up would be greatly appreciated.
>  
> Kenneth Hall
> ATTACHMENTS: 
> test_nep.f90
> code_output
>  
> <code_output><test_nep.f90>


From hongzhang at anl.gov  Fri Oct  6 08:15:12 2023
From: hongzhang at anl.gov (Zhang, Hong)
Date: Fri, 6 Oct 2023 13:15:12 +0000
Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses
 switching to COO interface
In-Reply-To: <SA1PR09MB807710D811420766DA7BBFADC6CAA@SA1PR09MB8077.namprd09.prod.outlook.com>
References: <SA1PR09MB80772E375B8C0C24CCE0BD85C6C5A@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp-NBG=CzGEZ-8PVCa3sUiG2erRCKUJG+rP-3cLcN+cVFg@mail.gmail.com>
	<CA+MQGp-tCQhjpVxjqF-zajKGsZNHZFH3v-S-RSdG=ciChu0Oow@mail.gmail.com>
	<CA+MQGp-Srn8rxJsaozPYt+hOge5VJdMuJdwVHB=sxHyCS_KoAA@mail.gmail.com>
	<SA1PR09MB807710D811420766DA7BBFADC6CAA@SA1PR09MB8077.namprd09.prod.outlook.com>
Message-ID: <7839BCF7-8FEC-4AAA-94FF-AABEB42586BC@anl.gov>

I noticed that you are using ARKIMEX in the code. A temporary workaround you can try is to disable adaptive time stepping, e.g. by using the option -ts_adapt_type none. Then MatShift() will not be called when the Jacobians are computed.

Hong (Mr.)

On Oct 5, 2023, at 4:52 PM, Fackler, Philip via petsc-users <petsc-users at mcs.anl.gov> wrote:

Aha! That makes sense. Thank you.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: Thursday, October 5, 2023 17:29
To: Fackler, Philip <facklerpw at ornl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>; Blondel, Sophie <sblondel at utk.edu>
Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface

Wait a moment, it seems it was because we do not have a GPU implementation of MatShift...
Let me see how to add it.
--Junchao Zhang


On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Hi, Philip,
  I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() instead of the COO interface?  MatSetValues() needs to copy the data from device to host and thus is expensive.
  Do you have profiling results with COO enabled?

<Screenshot 2023-10-05 at 10.55.29?AM.png>


--Junchao Zhang


On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Hi, Philip,
  I will look into the tarballs and get back to you.
   Thanks.
--Junchao Zhang


On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
We finally have xolotl ported to use the new COO interface and the aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port to our previous version (using MatSetValuesStencil and the default Mat and Vec implementations), we expected to see an improvement in performance for both the "serial" and "cuda" builds (here I'm referring to the kokkos configuration).

Attached are two plots that show timings for three different cases. All of these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on a single node). The CUDA cases were given one GPU per task (and used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible.

The performance of RHSJacobian (where the bulk of computation happens in xolotl) behaved basically as expected (better than expected in the serial build). NE_3 case in CUDA was the only one that performed worse, but not surprisingly, since its workload for the GPUs is much smaller. We've still got more optimization to do on this.

The real surprise was how much worse the overall solve times were. This seems to be due simply to switching to the kokkos-based implementation. I'm wondering if there are any changes we can make in configuration or runtime arguments to help with PETSc's performance here. Any help looking into this would be appreciated.

The tarballs linked here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA&e=> and here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk&e=> are profiling databases which, once extracted, can be viewed with hpcviewer. I don't know how helpful that will be, but hopefully it can give you some direction.

Thanks for your help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231006/5beac24a/attachment-0001.html>

From kenneth.c.hall at duke.edu  Fri Oct  6 09:56:29 2023
From: kenneth.c.hall at duke.edu (Kenneth C Hall)
Date: Fri, 6 Oct 2023 14:56:29 +0000
Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and
 T'(lambda)
In-Reply-To: <F1528F80-C92C-43A9-9CB7-75A1D8712100@dsic.upv.es>
References: <BL0PR05MB480177D5E10088FE9EC11DA4A2CAA@BL0PR05MB4801.namprd05.prod.outlook.com>
	<F1528F80-C92C-43A9-9CB7-75A1D8712100@dsic.upv.es>
Message-ID: <BL0PR05MB48011E5404BE66B6C685D3B0A2C9A@BL0PR05MB4801.namprd05.prod.outlook.com>

Jose,

Thanks for this.  I will try this and report back.

Kenneth

From: Jose E. Roman <jroman at dsic.upv.es>
Date: Friday, October 6, 2023 at 7:01 AM
To: Kenneth C Hall <kenneth.c.hall at duke.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda)
I am getting an error in a different place than you. I started to debug, but don't have much time at the moment.
Can you try something? Comparing to ex21.c, I see that a difference that may be relevant is the MATOP_DUPLICATE operation. Can you try defining it for your A matrix?

Note: If you plan to use the NLEIGS solver, there is no need to define the derivative T' so you can skip the call to NEPSetJacobian().

Jose


> El 6 oct 2023, a las 0:37, Kenneth C Hall <kenneth.c.hall at duke.edu> escribi?:
>
> Hi all,
>
> I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x.
>
> I am trying to implement in SLEPc/NEP.  To get started, I have defined a much smaller and simpler system of the form
> A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP.
>
> I have set the problem up using shell matrices (as that is my ultimate goal).  The full code is attached, but here is a smaller snippet of code:
>
> !.... Create matrix-free operators for A and B
>       PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr))
>       PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr))
>       PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr))
>       PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr))
>
> !.... Create nonlinear eigensolver
>       PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr))
>
> !.... Set the problem type
>       PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr))
> !
> !.... set the solver type
>       PetscCall(NEPSetType(nep, NEPNLEIGS, ierr))
> !
> !.... Set functions and Jacobians for NEP
>       PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr))
>       PetscCall(NEPSetJacobian(nep, B,    MyNEPJacobian, PETSC_NULL_INTEGER, ierr))
>
> The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B.  The NEP viewer and error messages are attached.
>
> Any help on getting this problem properly set up would be greatly appreciated.
>
> Kenneth Hall
> ATTACHMENTS:
> test_nep.f90
> code_output
>
> <code_output><test_nep.f90>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231006/2f80fa2f/attachment.html>

From kenneth.c.hall at duke.edu  Fri Oct  6 15:28:14 2023
From: kenneth.c.hall at duke.edu (Kenneth C Hall)
Date: Fri, 6 Oct 2023 20:28:14 +0000
Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and
 T'(lambda)
In-Reply-To: <F1528F80-C92C-43A9-9CB7-75A1D8712100@dsic.upv.es>
References: <BL0PR05MB480177D5E10088FE9EC11DA4A2CAA@BL0PR05MB4801.namprd05.prod.outlook.com>
	<F1528F80-C92C-43A9-9CB7-75A1D8712100@dsic.upv.es>
Message-ID: <BL0PR05MB4801BD698F33E63C93E55252A2C9A@BL0PR05MB4801.namprd05.prod.outlook.com>

Jose,

Unfortunately, I was unable to implement the MATOP_DUPLICATE operation in fortran (and I do not know enough c to work in c).  Here is the error message I get:


[0]PETSC ERROR: #1 MatShellSetOperation_Fortran() at /Users/hall/Documents/Fortran_Codes/Packages/petsc/src/mat/impls/shell/ftn-custom/zshellf.c:283

[0]PETSC ERROR: #2 src/test_nep.f90:62

When I look at zshellf.c, MATOP_DUPLICATE is not one of the supported operations. See below.

Kenneth


/**

 * Subset of MatOperation that is supported by the Fortran wrappers.

 */

enum FortranMatOperation {

  FORTRAN_MATOP_MULT               = 0,

  FORTRAN_MATOP_MULT_ADD           = 1,

  FORTRAN_MATOP_MULT_TRANSPOSE     = 2,

  FORTRAN_MATOP_MULT_TRANSPOSE_ADD = 3,

  FORTRAN_MATOP_SOR                = 4,

  FORTRAN_MATOP_TRANSPOSE          = 5,

  FORTRAN_MATOP_GET_DIAGONAL       = 6,

  FORTRAN_MATOP_DIAGONAL_SCALE     = 7,

  FORTRAN_MATOP_ZERO_ENTRIES       = 8,

  FORTRAN_MATOP_AXPY               = 9,

  FORTRAN_MATOP_SHIFT              = 10,

  FORTRAN_MATOP_DIAGONAL_SET       = 11,

  FORTRAN_MATOP_DESTROY            = 12,

  FORTRAN_MATOP_VIEW               = 13,

  FORTRAN_MATOP_CREATE_VECS        = 14,

  FORTRAN_MATOP_GET_DIAGONAL_BLOCK = 15,

  FORTRAN_MATOP_COPY               = 16,

  FORTRAN_MATOP_SCALE              = 17,

  FORTRAN_MATOP_SET_RANDOM         = 18,

  FORTRAN_MATOP_ASSEMBLY_BEGIN     = 19,

  FORTRAN_MATOP_ASSEMBLY_END       = 20,

  FORTRAN_MATOP_SIZE               = 21

};


From: Jose E. Roman <jroman at dsic.upv.es>
Date: Friday, October 6, 2023 at 7:01 AM
To: Kenneth C Hall <kenneth.c.hall at duke.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda)
I am getting an error in a different place than you. I started to debug, but don't have much time at the moment.
Can you try something? Comparing to ex21.c, I see that a difference that may be relevant is the MATOP_DUPLICATE operation. Can you try defining it for your A matrix?

Note: If you plan to use the NLEIGS solver, there is no need to define the derivative T' so you can skip the call to NEPSetJacobian().

Jose


> El 6 oct 2023, a las 0:37, Kenneth C Hall <kenneth.c.hall at duke.edu> escribi?:
>
> Hi all,
>
> I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x.
>
> I am trying to implement in SLEPc/NEP.  To get started, I have defined a much smaller and simpler system of the form
> A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP.
>
> I have set the problem up using shell matrices (as that is my ultimate goal).  The full code is attached, but here is a smaller snippet of code:
>
> !.... Create matrix-free operators for A and B
>       PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr))
>       PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr))
>       PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr))
>       PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr))
>
> !.... Create nonlinear eigensolver
>       PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr))
>
> !.... Set the problem type
>       PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr))
> !
> !.... set the solver type
>       PetscCall(NEPSetType(nep, NEPNLEIGS, ierr))
> !
> !.... Set functions and Jacobians for NEP
>       PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr))
>       PetscCall(NEPSetJacobian(nep, B,    MyNEPJacobian, PETSC_NULL_INTEGER, ierr))
>
> The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B.  The NEP viewer and error messages are attached.
>
> Any help on getting this problem properly set up would be greatly appreciated.
>
> Kenneth Hall
> ATTACHMENTS:
> test_nep.f90
> code_output
>
> <code_output><test_nep.f90>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231006/03f6c029/attachment.html>

From qiyuelu1 at gmail.com  Fri Oct  6 16:40:12 2023
From: qiyuelu1 at gmail.com (Qiyue Lu)
Date: Fri, 6 Oct 2023 16:40:12 -0500
Subject: [petsc-users] 'nvcc -show' Error for configure with NVCC
Message-ID: <CALm6fhnV_PgPuX_YAWq7uXVCc7xMTaVuJgC1F+Yr=swXO3+e_w@mail.gmail.com>

Hello,
I am trying to configure PETSc(current release version) with NVCC, with
these options:
./configure --with-cc=nvcc --with-cxx=nvcc --with-fc=0 --with-cuda=1

However, I got error like:
---------------------------------------------------------------------------------------------
  Could not execute "['nvcc -show']":
  nvcc fatal   : Unknown option '-show'
*********************************************************************************************

I wonder where this -show option comes from? It seems safe to disable this
option.

Thanks,
Qiyue Lu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231006/16e74406/attachment-0001.html>

From balay at mcs.anl.gov  Fri Oct  6 16:50:03 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 6 Oct 2023 16:50:03 -0500 (CDT)
Subject: [petsc-users] 'nvcc -show' Error for configure with NVCC
In-Reply-To: <CALm6fhnV_PgPuX_YAWq7uXVCc7xMTaVuJgC1F+Yr=swXO3+e_w@mail.gmail.com>
References: <CALm6fhnV_PgPuX_YAWq7uXVCc7xMTaVuJgC1F+Yr=swXO3+e_w@mail.gmail.com>
Message-ID: <28d271d8-f320-e982-5cbb-1e2bf50893bb@mcs.anl.gov>


On Fri, 6 Oct 2023, Qiyue Lu wrote:

> Hello,
> I am trying to configure PETSc(current release version) with NVCC, with
> these options:
> ./configure --with-cc=nvcc --with-cxx=nvcc --with-fc=0 --with-cuda=1

this usage is incorrect. You need:

--with-cc=mpicc --with-cxx=mpicxx --with-cudac=nvcc --with-cuda=1

Satish

> 
> However, I got error like:
> ---------------------------------------------------------------------------------------------
>   Could not execute "['nvcc -show']":
>   nvcc fatal   : Unknown option '-show'
> *********************************************************************************************
> 
> I wonder where this -show option comes from? It seems safe to disable this
> option.
> 
> Thanks,
> Qiyue Lu
> 


From Roland.Richter at empa.ch  Mon Oct  9 07:32:16 2023
From: Roland.Richter at empa.ch (Richter, Roland)
Date: Mon, 9 Oct 2023 12:32:16 +0000
Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and Intel
 MPI fails
Message-ID: <c98fe2ad282047bca93568953c6b058e@empa.ch>

Hei,

I'm currently trying to install PETSc on a server (Ubuntu 22.04) with Intel
MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc -cc=icx"
as C-compiler, as described by https://stackoverflow.com/a/76362396.
Therefore, I adapted the configure-line as follow:

 
./configure --prefix=/media/storage/local_opt/petsc
--with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc
-cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC
-march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true
--with-mpi=true
--with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/
--with-openmp=true --download-hdf5=yes --download-netcdf=yes
--download-chaco=no --download-metis=yes --download-slepc=yes
--download-suitesparse=yes --download-eigen=yes --download-parmetis=yes
--download-ptscotch=yes --download-mumps=yes --download-scalapack=yes
--download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1
--with-boost=1 --with-boost-dir=/media/storage/local_opt/boost
--download-opencascade=yes --with-fftw=1
--with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes
--with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1
--download-muparser=no --download-p4est=yes --download-sowing=yes
--download-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=1

 
The configuration, however, fails with 

 
The CMAKE_C_COMPILER:

 
    mpiicc -cc=icx

 
  is not a full path and was not found in the PATH

 
for all additional modules which use a cmake-based configuration approach
(such as OPENCASCADE). How could I solve that problem?

 
Thank you!

Regards,

Roland Richter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231009/127f815b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 3969230 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231009/127f815b/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 7926 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231009/127f815b/attachment-0001.p7s>

From junchao.zhang at gmail.com  Mon Oct  9 09:23:28 2023
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Mon, 9 Oct 2023 09:23:28 -0500
Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and
 Intel MPI fails
In-Reply-To: <c98fe2ad282047bca93568953c6b058e@empa.ch>
References: <c98fe2ad282047bca93568953c6b058e@empa.ch>
Message-ID: <CA+MQGp-0LLY2j_xXT_W3AfXtBjZnKyPEtiVxHczTrX0q=xyzUQ@mail.gmail.com>

Could you just use "--with-cc=mpiicx --with-cxx=mpiicpx" ?  In addition,
you can export environment vars I_MPI_CC=icx  and I_MPI_CXX=icpx to
specify the underlying compilers.

--Junchao Zhang


On Mon, Oct 9, 2023 at 7:33?AM Richter, Roland <Roland.Richter at empa.ch>
wrote:

> Hei,
>
> I'm currently trying to install PETSc on a server (Ubuntu 22.04) with
> Intel MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc
> -cc=icx" as C-compiler, as described by
> https://stackoverflow.com/a/76362396. Therefore, I adapted the
> configure-line as follow:
>
>
>
> *./configure --prefix=/media/storage/local_opt/petsc
> --with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc
> -cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC
> -march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true
> --with-mpi=true
> --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/
> --with-openmp=true --download-hdf5=yes --download-netcdf=yes
> --download-chaco=no --download-metis=yes --download-slepc=yes
> --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes
> --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes
> --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1
> --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost
> --download-opencascade=yes --with-fftw=1
> --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes
> --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1
> --download-muparser=no --download-p4est=yes --download-sowing=yes
> --download-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=1*
>
>
>
> The configuration, however, fails with
>
>
>
> *The CMAKE_C_COMPILER:*
>
>
>
> *    mpiicc -cc=icx*
>
>
>
> *  is not a full path and was not found in the PATH*
>
>
>
> for all additional modules which use a cmake-based configuration approach
> (such as OPENCASCADE). How could I solve that problem?
>
>
>
> Thank you!
>
> Regards,
>
> Roland Richter
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231009/365ca886/attachment.html>

From bsmith at petsc.dev  Mon Oct  9 09:23:55 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 9 Oct 2023 10:23:55 -0400
Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and
 Intel MPI fails
In-Reply-To: <c98fe2ad282047bca93568953c6b058e@empa.ch>
References: <c98fe2ad282047bca93568953c6b058e@empa.ch>
Message-ID: <3CF831A3-F5DC-4055-9F00-FA7DD7242EBB@petsc.dev>


  Instead of using the mpiicc -cc=icx style use -- with-cc=mpiicc (etc) and 

export I_MPI_CC=icx
export I_MPI_CXX=icpx
export I_MPI_F90=ifx


> On Oct 9, 2023, at 8:32 AM, Richter, Roland <Roland.Richter at empa.ch> wrote:
> 
> Hei,
> I'm currently trying to install PETSc on a server (Ubuntu 22.04) with Intel MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc -cc=icx" as C-compiler, as described by https://stackoverflow.com/a/76362396. Therefore, I adapted the configure-line as follow:
>  
> ./configure --prefix=/media/storage/local_opt/petsc --with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc -cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC -march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true --with-mpi=true --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/ --with-openmp=true --download-hdf5=yes --download-netcdf=yes --download-chaco=no --download-metis=yes --download-slepc=yes --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1 --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost --download-opencascade=yes --with-fftw=1 --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1 --download-muparser=no --download-p4est=yes --download-sowing=yes --download-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=1
>  
> The configuration, however, fails with 
>  
> The CMAKE_C_COMPILER:
>  
>     mpiicc -cc=icx
>  
>   is not a full path and was not found in the PATH
>  
> for all additional modules which use a cmake-based configuration approach (such as OPENCASCADE). How could I solve that problem?
>  
> Thank you!
> Regards,
> Roland Richter
> <configure.log>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231009/e6e4c950/attachment.html>

From Pierre.LEDAC at cea.fr  Mon Oct  9 09:35:49 2023
From: Pierre.LEDAC at cea.fr (LEDAC Pierre)
Date: Mon, 9 Oct 2023 14:35:49 +0000
Subject: [petsc-users] PETSc 3.14 to PETSc 3.20: Different (slower)
 convergence for classical AMG (sequential and especially in parallel)
Message-ID: <4c9f02898f324fcd8be1fe5dcc9f0416@cea.fr>

Hello all,


I am struggling to find the same convergence in iterations when using classical algebric multigrid in my code with PETSc 3.20 compared to PETSc 3.14.


I am using in order to solve a Poisson system:

-ksp_type cg -pc_type gamg -pc_gamg_type classical


I read the different releases notes between 3.15 and 3.20:

https://petsc.org/release/changes/317

https://petsc.org/main/manualpages/PC/PCGAMGSetThreshold/


And have a look at the archive mailing list (especially this one: https://www.mail-archive.com/petsc-users at mcs.anl.gov/msg46688.html)

so I added some other options to try to have the same behaviour than PETSc 3.14:


-ksp_type cg -pc_type gamg -pc_gamg_type classical -mg_levels_pc_type sor -pc_gamg_threshold 0.


It improves the convergence but there still a different convergence though (26 vs 18 iterations).

On another of my test case, the number of levels is different (e.g. 6 vs 4) also, and here it is the same, but with a different coarsening according to the output from the -ksp_view option


The main point is that the convergence dramatically degrades in parallel on a third test case, so I can't upgrade to PETSc 3.20 for now unhappily.


I send you the partial report (petsc_314_vs_petsc_320.ksp_view) with -ksp_view (left PETSc 3.14, right PETSc 3.20) and the configure/command line options used (in petsc_XXX_petsc.TU files).


Could my issue related to the following 3.18 change ? I have not tried the first one.

  *   Remove PCGAMGSetSymGraph() and -pc_gamg_sym_graph. The user should now indicate symmetry and structural symmetry using MatSetOption<https://petsc.org/release/manualpages/Mat/MatSetOption/>() and GAMG will symmetrize the graph if a symmetric options is not set

  *   Change -pc_gamg_reuse_interpolation default from false to true.


Any advice would be greatly appreciated,


Pierre LEDAC
Commissariat ? l??nergie atomique et aux ?nergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
B?timent 451 ? point courrier n?43
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231009/9f658334/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_320_petsc.TU
Type: application/octet-stream
Size: 15358 bytes
Desc: petsc_320_petsc.TU
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231009/9f658334/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_314_petsc.TU
Type: application/octet-stream
Size: 14761 bytes
Desc: petsc_314_petsc.TU
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231009/9f658334/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_314_vs_petsc_320.ksp_view
Type: application/octet-stream
Size: 30948 bytes
Desc: petsc_314_vs_petsc_320.ksp_view
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231009/9f658334/attachment-0005.obj>

From balay at mcs.anl.gov  Mon Oct  9 10:29:08 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 9 Oct 2023 10:29:08 -0500 (CDT)
Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and
 Intel MPI fails
In-Reply-To: <3CF831A3-F5DC-4055-9F00-FA7DD7242EBB@petsc.dev>
References: <c98fe2ad282047bca93568953c6b058e@empa.ch>
	<3CF831A3-F5DC-4055-9F00-FA7DD7242EBB@petsc.dev>
Message-ID: <78e0a665-e6fc-4566-4900-6faa2e593c72@mcs.anl.gov>

Will note - OneAPI MPI usage is documented at https://petsc.org/release/install/install/#mpi

Satish

On Mon, 9 Oct 2023, Barry Smith wrote:

> 
>   Instead of using the mpiicc -cc=icx style use -- with-cc=mpiicc (etc) and 
> 
> export I_MPI_CC=icx
> export I_MPI_CXX=icpx
> export I_MPI_F90=ifx
> 
> 
> > On Oct 9, 2023, at 8:32 AM, Richter, Roland <Roland.Richter at empa.ch> wrote:
> > 
> > Hei,
> > I'm currently trying to install PETSc on a server (Ubuntu 22.04) with Intel MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc -cc=icx" as C-compiler, as described by https://stackoverflow.com/a/76362396. Therefore, I adapted the configure-line as follow:
> >  
> > ./configure --prefix=/media/storage/local_opt/petsc --with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc -cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC -march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true --with-mpi=true --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/ --with-openmp=true --download-hdf5=yes --download-netcdf=yes --download-chaco=no --download-metis=yes --download-slepc=yes --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1 --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost --download-opencascade=yes --with-fftw=1 --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1 --download-muparser=no --download-p4est=yes --download-sowing=y
 es --dow
 nload-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=1
> >  
> > The configuration, however, fails with 
> >  
> > The CMAKE_C_COMPILER:
> >  
> >     mpiicc -cc=icx
> >  
> >   is not a full path and was not found in the PATH
> >  
> > for all additional modules which use a cmake-based configuration approach (such as OPENCASCADE). How could I solve that problem?
> >  
> > Thank you!
> > Regards,
> > Roland Richter
> > <configure.log>
> 
> 

From yc17470 at connect.um.edu.mo  Tue Oct 10 08:27:57 2023
From: yc17470 at connect.um.edu.mo (Gong Yujie)
Date: Tue, 10 Oct 2023 13:27:57 +0000
Subject: [petsc-users] Scalability problem using PETSc with local installed
 OpenMPI
Message-ID: <OSZP286MB1061439A09FCD410B123C612EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>

Dear PETSc developers,

I installed OpenMPI3 first and then installed PETSc with that mpi. Currently, I'm facing a scalability issue, in detail, I tested that using OpenMPI to calculate an addition of two distributed arrays and I get a good scalability. The problem is when I calculate the addition of two vectors in PETSc, I don't have any scalability. For the same size of the problem, PETSc costs a lot much time than merely using OpenMPI.

My PETSc version is 3.16.0 and the version of OpenMPI is 3.1.4. Hope you can give me some suggestions.

Best Regards,
Yujie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231010/44bcd743/attachment.html>

From knepley at gmail.com  Tue Oct 10 08:54:27 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 10 Oct 2023 09:54:27 -0400
Subject: [petsc-users] Scalability problem using PETSc with local
 installed OpenMPI
In-Reply-To: <OSZP286MB1061439A09FCD410B123C612EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
References: <OSZP286MB1061439A09FCD410B123C612EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
Message-ID: <CAMYG4G=Y5XgQ6G_Bz8q7e_n+7rYB1K4PsXppv1THhMShq9SgQg@mail.gmail.com>

On Tue, Oct 10, 2023 at 9:28?AM Gong Yujie <yc17470 at connect.um.edu.mo>
wrote:

> Dear PETSc developers,
>
> I installed OpenMPI3 first and then installed PETSc with that mpi.
> Currently, I'm facing a scalability issue, in detail, I tested that using
> OpenMPI to calculate an addition of two distributed arrays and I get a good
> scalability. The problem is when I calculate the addition of two vectors in
> PETSc, I don't have any scalability. For the same size of the problem,
> PETSc costs a lot much time than merely using OpenMPI.
>
> My PETSc version is 3.16.0 and the version of OpenMPI is 3.1.4. Hope you
> can give me some suggestions.
>

1. For any performance question, we really need to see the output of
-log_view for each run.

2. I am not sure I understand your question. Vector addition does not
involve communication. Thus it will scale perfectly in the absence of load
imbalance.

  Thanks,

      Matt


> Best Regards,
> Yujie
>
-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231010/d97fdb34/attachment.html>

From bsmith at petsc.dev  Tue Oct 10 08:59:53 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 10 Oct 2023 09:59:53 -0400
Subject: [petsc-users] Scalability problem using PETSc with local
 installed OpenMPI
In-Reply-To: <OSZP286MB1061439A09FCD410B123C612EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
References: <OSZP286MB1061439A09FCD410B123C612EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
Message-ID: <764C6422-14C5-4A19-97A3-36BEB80690FB@petsc.dev>


  Take a look at https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup

  Check the binding that OpenMPI is using (by the way, there are much more recent OpenMPI versions, I suggest using them). Run the STREAMS benchmark as indicated on that page.

  Barry


> On Oct 10, 2023, at 9:27 AM, Gong Yujie <yc17470 at connect.um.edu.mo> wrote:
> 
> Dear PETSc developers,
> 
> I installed OpenMPI3 first and then installed PETSc with that mpi. Currently, I'm facing a scalability issue, in detail, I tested that using OpenMPI to calculate an addition of two distributed arrays and I get a good scalability. The problem is when I calculate the addition of two vectors in PETSc, I don't have any scalability. For the same size of the problem, PETSc costs a lot much time than merely using OpenMPI. 
> 
> My PETSc version is 3.16.0 and the version of OpenMPI is 3.1.4. Hope you can give me some suggestions.
> 
> Best Regards,
> Yujie

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231010/7711a60c/attachment-0001.html>

From bsmith at petsc.dev  Tue Oct 10 09:39:08 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 10 Oct 2023 10:39:08 -0400
Subject: [petsc-users] Scalability problem using PETSc with local
 installed OpenMPI
In-Reply-To: <OSZP286MB10618B8E9CA248D487C70DF6EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
References: <OSZP286MB1061439A09FCD410B123C612EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
	<764C6422-14C5-4A19-97A3-36BEB80690FB@petsc.dev>
	<OSZP286MB10618B8E9CA248D487C70DF6EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
Message-ID: <AB91A447-B2F7-494E-950C-B5A9F7127208@petsc.dev>


  Run STREAMS with 

MPI_BINDING="-map-by socket --bind-to core --report-bindings" make mpistreams

send the result

  Also run 

lscpu
numactl -H

if they are available on your machine, send the result


> On Oct 10, 2023, at 10:17 AM, Gong Yujie <yc17470 at connect.um.edu.mo> wrote:
> 
> Dear Barry,
> 
> I tried to use the binding as suggested by PETSc: 
> mpiexec -n 4 --map-by socket --bind-to socket --report-bindings
> But it seems not improving the performance. Here is the make stream log
> 
> Best Regards,
> Yujie
> 
> mpicc -o MPIVersion.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O    -I/home/tt/petsc-3.16.0/include -I/home/tt/petsc-3.16.0/arch-linux-c-opt/include    `pwd`/MPIVersion.c
> Running streams with 'mpiexec --oversubscribe ' using 'NPMAX=16'
> 1  26119.1937   Rate (MB/s)
> 2  29833.4281   Rate (MB/s) 1.1422
> 3  65338.5050   Rate (MB/s) 2.50155
> 4  59832.7482   Rate (MB/s) 2.29076
> 5  48629.8396   Rate (MB/s) 1.86184
> 6  58569.4289   Rate (MB/s) 2.24239
> 7  63827.1144   Rate (MB/s) 2.44369
> 8  57448.5349   Rate (MB/s) 2.19948
> 9  61405.3273   Rate (MB/s) 2.35097
> 10  68021.6111   Rate (MB/s) 2.60428
> 11  71289.0422   Rate (MB/s) 2.72937
> 12  76900.6386   Rate (MB/s) 2.94422
> 13  80198.6807   Rate (MB/s) 3.07049
> 14  64846.3685   Rate (MB/s) 2.48271
> 15  83072.8631   Rate (MB/s) 3.18053
> 16  70128.0166   Rate (MB/s) 2.68492
> ------------------------------------------------
> Traceback (most recent call last):
>   File "process.py", line 89, in <module>
>     process(sys.argv[1],len(sys.argv)-2)
>   File "process.py", line 33, in process
>     speedups[i] = triads[i]/triads[0]
> TypeError: 'dict_values' object does not support indexing
> make[2]: [makefile:47: mpistream] Error 1 (ignored)
> Traceback (most recent call last):
>   File "process.py", line 89, in <module>
>     process(sys.argv[1],len(sys.argv)-2)
>   File "process.py", line 33, in process
>     speedups[i] = triads[i]/triads[0]
> TypeError: 'dict_values' object does not support indexing
> make[2]: [makefile:79: mpistreams] Error 1 (ignored)
> From: Barry Smith <bsmith at petsc.dev>
> Sent: Tuesday, October 10, 2023 9:59 PM
> To: Gong Yujie <yc17470 at connect.um.edu.mo>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Scalability problem using PETSc with local installed OpenMPI
>  
> 
>   Take a look at https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
> 
>   Check the binding that OpenMPI is using (by the way, there are much more recent OpenMPI versions, I suggest using them). Run the STREAMS benchmark as indicated on that page.
> 
>   Barry
> 
> 
>> On Oct 10, 2023, at 9:27 AM, Gong Yujie <yc17470 at connect.um.edu.mo> wrote:
>> 
>> Dear PETSc developers,
>> 
>> I installed OpenMPI3 first and then installed PETSc with that mpi. Currently, I'm facing a scalability issue, in detail, I tested that using OpenMPI to calculate an addition of two distributed arrays and I get a good scalability. The problem is when I calculate the addition of two vectors in PETSc, I don't have any scalability. For the same size of the problem, PETSc costs a lot much time than merely using OpenMPI. 
>> 
>> My PETSc version is 3.16.0 and the version of OpenMPI is 3.1.4. Hope you can give me some suggestions.
>> 
>> Best Regards,
>> Yujie

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231010/b5a69399/attachment.html>

From bsmith at petsc.dev  Tue Oct 10 10:10:56 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 10 Oct 2023 11:10:56 -0400
Subject: [petsc-users] Scalability problem using PETSc with local
 installed OpenMPI
In-Reply-To: <OSZP286MB10611C2275EA38B615E48495EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
References: <OSZP286MB1061439A09FCD410B123C612EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
	<764C6422-14C5-4A19-97A3-36BEB80690FB@petsc.dev>
	<OSZP286MB10618B8E9CA248D487C70DF6EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
	<AB91A447-B2F7-494E-950C-B5A9F7127208@petsc.dev>
	<OSZP286MB10611C2275EA38B615E48495EBCDA@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
Message-ID: <0BFABF42-4509-488D-AF88-4559A4ACA14D@petsc.dev>


   This tells me you cannot realistically expect for large PETSc problems to get much more than a speedup of 2 on this system using two or four MPI processes; there simply is not more memory bandwidth available. There are two NUMA regions and a single core largely saturates a region. Also, always running MPI with the binding is important to get that small speedup.

  Barry


> On Oct 10, 2023, at 10:47 AM, Gong Yujie <yc17470 at connect.um.edu.mo> wrote:
> 
> Here is the result from STREAMS,  lscpu and numactl. 
> 
> 
> -----------------------------lscpu------------------------------------
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                128
> On-line CPU(s) list:   0-127
> Thread(s) per core:    1
> Core(s) per socket:    64
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             AuthenticAMD
> CPU family:            23
> Model:                 49
> Model name:            AMD EPYC 7702 64-Core Processor
> Stepping:              0
> CPU MHz:               1996.019
> BogoMIPS:              3992.03
> Virtualization:        AMD-V
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              512K
> L3 cache:              16384K
> NUMA node0 CPU(s):     0-63
> NUMA node1 CPU(s):     64-127
> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip overflow_recov succor smca
> 
> ----------------------------numactl -H-----------------------------
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
> node 0 size: 128418 MB
> node 0 free: 123340 MB
> node 1 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
> node 1 size: 129010 MB
> node 1 free: 124685 MB
> node distances:
> node   0   1
>   0:  10  32
>   1:  32  10
> 
> 
> --------------------------STREAMS----------------------------------
> 
> mpicc -o MPIVersion.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O    -I/home/tt/petsc-3.16.0/include -I/home/tt/petsc-3.16.0/arch-linux-c-opt/include    `pwd`/MPIVersion.c
> Running streams with 'mpiexec --oversubscribe -map-by socket --bind-to core --report-bindings' using 'NPMAX=40'
> [cpunode1:68038] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 1  26155.1277   Rate (MB/s)
> [cpunode1:68050] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68050] MCW rank 1 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 2  52098.6873   Rate (MB/s) 1.99191
> [cpunode1:68065] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68065] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68065] MCW rank 2 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 3  44731.8512   Rate (MB/s) 1.71025
> [cpunode1:68082] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68082] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68082] MCW rank 2 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68082] MCW rank 3 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 4  59559.5275   Rate (MB/s) 2.27717
> [cpunode1:68103] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68103] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68103] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68103] MCW rank 3 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68103] MCW rank 4 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 5  48477.2117   Rate (MB/s) 1.85345
> [cpunode1:68126] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68126] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68126] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68126] MCW rank 3 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68126] MCW rank 4 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68126] MCW rank 5 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 6  58136.2545   Rate (MB/s) 2.22275
> [cpunode1:68153] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68153] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68153] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68153] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68153] MCW rank 4 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68153] MCW rank 5 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68153] MCW rank 6 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 7  50119.2133   Rate (MB/s) 1.91623
> [cpunode1:68182] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68182] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68182] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68182] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68182] MCW rank 4 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68182] MCW rank 5 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68182] MCW rank 6 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68182] MCW rank 7 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 8  57432.5057   Rate (MB/s) 2.19584
> [cpunode1:68214] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68214] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68214] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68214] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68214] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68214] MCW rank 5 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68214] MCW rank 6 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68214] MCW rank 7 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68214] MCW rank 8 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 9  52345.9115   Rate (MB/s) 2.00137
> [cpunode1:68250] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68250] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68250] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68250] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68250] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68250] MCW rank 5 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68250] MCW rank 6 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68250] MCW rank 7 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68250] MCW rank 8 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68250] MCW rank 9 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 10  57727.5090   Rate (MB/s) 2.20712
> [cpunode1:68288] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68288] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68288] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68288] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68288] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68288] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68288] MCW rank 6 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68288] MCW rank 7 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68288] MCW rank 8 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68288] MCW rank 9 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68288] MCW rank 10 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 11  52568.6771   Rate (MB/s) 2.00988
> [cpunode1:68330] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68330] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68330] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68330] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68330] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68330] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68330] MCW rank 6 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68330] MCW rank 7 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68330] MCW rank 8 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68330] MCW rank 9 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68330] MCW rank 10 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68330] MCW rank 11 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 12  57286.7990   Rate (MB/s) 2.19027
> [cpunode1:68383] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 7 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 8 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 9 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 10 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 11 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68383] MCW rank 12 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 13  52721.4401   Rate (MB/s) 2.01572
> [cpunode1:68430] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 7 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 8 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 9 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 10 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 11 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 12 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68430] MCW rank 13 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 14  56787.2447   Rate (MB/s) 2.17117
> [cpunode1:68481] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 8 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 9 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 10 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 11 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 12 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 13 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68481] MCW rank 14 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 15  53317.0901   Rate (MB/s) 2.0385
> [cpunode1:68534] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 8 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 9 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 10 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 11 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 12 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 13 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 14 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68534] MCW rank 15 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 16  56708.7028   Rate (MB/s) 2.16817
> [cpunode1:68590] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 9 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 10 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 11 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 12 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 13 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 14 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 15 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68590] MCW rank 16 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 17  58994.6721   Rate (MB/s) 2.25557
> [cpunode1:68649] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 9 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 10 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 11 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 12 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 13 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 14 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 15 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 16 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68649] MCW rank 17 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 18  62089.5079   Rate (MB/s) 2.3739
> [cpunode1:68711] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 10 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 11 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 12 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 13 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 14 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 15 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 16 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 17 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68711] MCW rank 18 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 19  63588.1264   Rate (MB/s) 2.43119
> [cpunode1:68776] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 10 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 11 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 12 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 13 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 14 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 15 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 16 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 17 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 18 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68776] MCW rank 19 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 20  67097.8382   Rate (MB/s) 2.56538
> [cpunode1:68844] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 11 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 12 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 13 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 14 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 15 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 16 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 17 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 18 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 19 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68844] MCW rank 20 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 21  68642.9757   Rate (MB/s) 2.62446
> [cpunode1:68917] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 11 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 12 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 13 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 14 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 15 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 16 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 17 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 18 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 19 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 20 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68917] MCW rank 21 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 22  71264.2836   Rate (MB/s) 2.72468
> [cpunode1:68991] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 12 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 13 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 14 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 15 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 16 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 17 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 18 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 19 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 20 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 21 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:68991] MCW rank 22 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 23  72876.6138   Rate (MB/s) 2.78633
> [cpunode1:69069] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 12 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 13 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 14 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 15 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 16 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 17 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 18 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 19 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 20 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 21 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 22 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69069] MCW rank 23 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 24  75732.6676   Rate (MB/s) 2.89552
> [cpunode1:69149] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 13 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 14 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 15 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 16 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 17 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 18 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 19 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 20 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 21 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 22 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 23 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69149] MCW rank 24 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 25  77217.0466   Rate (MB/s) 2.95227
> [cpunode1:69232] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 13 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 14 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 15 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 16 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 17 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 18 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 19 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 20 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 21 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 22 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 23 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 24 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69232] MCW rank 25 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 26  80035.7602   Rate (MB/s) 3.06004
> [cpunode1:69318] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 14 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 15 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 16 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 17 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 18 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 19 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 20 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 21 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 22 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 23 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 24 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 25 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69318] MCW rank 26 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> 27  80846.6416   Rate (MB/s) 3.09105
> [cpunode1:69408] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 14 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 15 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 16 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 17 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 18 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 19 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 20 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 21 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 22 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 23 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 24 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 25 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 26 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69408] MCW rank 27 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> 28  83282.5335   Rate (MB/s) 3.18418
> [cpunode1:69500] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 15 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 16 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 17 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 18 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 19 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 20 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 21 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 22 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 23 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 24 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 25 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 26 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 27 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69500] MCW rank 28 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> 29  83988.1592   Rate (MB/s) 3.21116
> [cpunode1:69596] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 15 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 16 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 17 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 18 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 19 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 20 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 21 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 22 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 23 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 24 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 25 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 26 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 27 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 28 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69596] MCW rank 29 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.]
> 30  87241.9164   Rate (MB/s) 3.33556
> [cpunode1:69707] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 16 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 17 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 18 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 19 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 20 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 21 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 22 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 23 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 24 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 25 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 26 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 27 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 28 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 29 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69707] MCW rank 30 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.]
> 31  87821.6811   Rate (MB/s) 3.35773
> [cpunode1:69810] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 16 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 17 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 18 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 19 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 20 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 21 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 22 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 23 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 24 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 25 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 26 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 27 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 28 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 29 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 30 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69810] MCW rank 31 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.]
> 32  90156.4778   Rate (MB/s) 3.44699
> [cpunode1:69914] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 17 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 18 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 19 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 20 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 21 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 22 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 23 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 24 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 25 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 26 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 27 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 28 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 29 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 30 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 31 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:69914] MCW rank 32 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.]
> 33  90112.8468   Rate (MB/s) 3.44533
> [cpunode1:70021] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 17 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 18 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 19 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 20 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 21 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 22 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 23 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 24 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 25 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 26 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 27 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 28 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 29 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 30 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 31 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 32 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70021] MCW rank 33 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.]
> 34  92366.4171   Rate (MB/s) 3.53149
> [cpunode1:70131] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 18 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 19 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 20 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 21 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 22 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 23 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 24 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 25 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 26 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 27 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 28 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 29 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 30 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 31 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 32 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 33 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70131] MCW rank 34 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.]
> 35  91504.9533   Rate (MB/s) 3.49855
> [cpunode1:70244] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 18 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 19 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 20 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 21 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 22 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 23 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 24 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 25 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 26 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 27 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 28 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 29 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 30 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 31 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 32 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 33 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 34 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70244] MCW rank 35 bound to socket 1[core 81[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.]
> 36  94404.1634   Rate (MB/s) 3.6094
> [cpunode1:70360] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 18 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 19 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 20 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 21 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 22 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 23 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 24 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 25 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 26 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 27 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 28 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 29 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 30 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 31 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 32 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 33 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 34 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 35 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70360] MCW rank 36 bound to socket 1[core 81[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.]
> 37  93616.1843   Rate (MB/s) 3.57927
> [cpunode1:70479] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 18 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 19 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 20 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 21 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 22 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 23 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 24 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 25 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 26 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 27 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 28 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 29 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 30 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 31 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 32 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 33 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 34 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 35 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 36 bound to socket 1[core 81[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70479] MCW rank 37 bound to socket 1[core 82[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.]
> 38  95857.0121   Rate (MB/s) 3.66495
> [cpunode1:70601] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 18 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 19 bound to socket 0[core 19[hwt 0]]: [./././././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 20 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 21 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 22 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 23 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 24 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 25 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 26 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 27 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 28 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 29 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 30 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 31 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 32 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 33 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 34 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 35 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 36 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 37 bound to socket 1[core 81[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70601] MCW rank 38 bound to socket 1[core 82[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.]
> 39  95242.8041   Rate (MB/s) 3.64146
> [cpunode1:70726] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 18 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 19 bound to socket 0[core 19[hwt 0]]: [./././././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 20 bound to socket 1[core 64[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 21 bound to socket 1[core 65[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 22 bound to socket 1[core 66[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 23 bound to socket 1[core 67[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 24 bound to socket 1[core 68[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 25 bound to socket 1[core 69[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 26 bound to socket 1[core 70[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 27 bound to socket 1[core 71[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 28 bound to socket 1[core 72[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 29 bound to socket 1[core 73[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 30 bound to socket 1[core 74[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 31 bound to socket 1[core 75[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 32 bound to socket 1[core 76[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 33 bound to socket 1[core 77[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 34 bound to socket 1[core 78[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 35 bound to socket 1[core 79[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 36 bound to socket 1[core 80[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 37 bound to socket 1[core 81[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 38 bound to socket 1[core 82[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][././././././././././././././././././B/././././././././././././././././././././././././././././././././././././././././././././.]
> [cpunode1:70726] MCW rank 39 bound to socket 1[core 83[hwt 0]]: [./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.][./././././././././././././././././././B/./././././././././././././././././././././././././././././././././././././././././././.]
> 40  97441.9980   Rate (MB/s) 3.72554
> ------------------------------------------------
> Traceback (most recent call last):
>   File "process.py", line 89, in <module>
>     process(sys.argv[1],len(sys.argv)-2)
>   File "process.py", line 33, in process
>     speedups[i] = triads[i]/triads[0]
> TypeError: 'dict_values' object does not support indexing
> make[2]: [makefile:47: mpistream] Error 1 (ignored)
> Traceback (most recent call last):
>   File "process.py", line 89, in <module>
>     process(sys.argv[1],len(sys.argv)-2)
>   File "process.py", line 33, in process
>     speedups[i] = triads[i]/triads[0]
> TypeError: 'dict_values' object does not support indexing
> make[2]: [makefile:79: mpistreams] Error 1 (ignored)
> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
> Sent: Tuesday, October 10, 2023 10:39 PM
> To: Gong Yujie <yc17470 at connect.um.edu.mo <mailto:yc17470 at connect.um.edu.mo>>
> Cc: PETSc users list <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] Scalability problem using PETSc with local installed OpenMPI
>  
> 
>   Run STREAMS with 
> 
> MPI_BINDING="-map-by socket --bind-to core --report-bindings" make mpistreams
> 
> send the result
> 
>   Also run 
> 
> lscpu
> numactl -H
> 
> if they are available on your machine, send the result
> 
> 
>> On Oct 10, 2023, at 10:17 AM, Gong Yujie <yc17470 at connect.um.edu.mo <mailto:yc17470 at connect.um.edu.mo>> wrote:
>> 
>> Dear Barry,
>> 
>> I tried to use the binding as suggested by PETSc: 
>> mpiexec -n 4 --map-by socket --bind-to socket --report-bindings
>> But it seems not improving the performance. Here is the make stream log
>> 
>> Best Regards,
>> Yujie
>> 
>> mpicc -o MPIVersion.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O    -I/home/tt/petsc-3.16.0/include -I/home/tt/petsc-3.16.0/arch-linux-c-opt/include    `pwd`/MPIVersion.c
>> Running streams with 'mpiexec --oversubscribe ' using 'NPMAX=16'
>> 1  26119.1937   Rate (MB/s)
>> 2  29833.4281   Rate (MB/s) 1.1422
>> 3  65338.5050   Rate (MB/s) 2.50155
>> 4  59832.7482   Rate (MB/s) 2.29076
>> 5  48629.8396   Rate (MB/s) 1.86184
>> 6  58569.4289   Rate (MB/s) 2.24239
>> 7  63827.1144   Rate (MB/s) 2.44369
>> 8  57448.5349   Rate (MB/s) 2.19948
>> 9  61405.3273   Rate (MB/s) 2.35097
>> 10  68021.6111   Rate (MB/s) 2.60428
>> 11  71289.0422   Rate (MB/s) 2.72937
>> 12  76900.6386   Rate (MB/s) 2.94422
>> 13  80198.6807   Rate (MB/s) 3.07049
>> 14  64846.3685   Rate (MB/s) 2.48271
>> 15  83072.8631   Rate (MB/s) 3.18053
>> 16  70128.0166   Rate (MB/s) 2.68492
>> ------------------------------------------------
>> Traceback (most recent call last):
>>   File "process.py", line 89, in <module>
>>     process(sys.argv[1],len(sys.argv)-2)
>>   File "process.py", line 33, in process
>>     speedups[i] = triads[i]/triads[0]
>> TypeError: 'dict_values' object does not support indexing
>> make[2]: [makefile:47: mpistream] Error 1 (ignored)
>> Traceback (most recent call last):
>>   File "process.py", line 89, in <module>
>>     process(sys.argv[1],len(sys.argv)-2)
>>   File "process.py", line 33, in process
>>     speedups[i] = triads[i]/triads[0]
>> TypeError: 'dict_values' object does not support indexing
>> make[2]: [makefile:79: mpistreams] Error 1 (ignored)
>> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>> Sent: Tuesday, October 10, 2023 9:59 PM
>> To: Gong Yujie <yc17470 at connect.um.edu.mo <mailto:yc17470 at connect.um.edu.mo>>
>> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>> Subject: Re: [petsc-users] Scalability problem using PETSc with local installed OpenMPI
>>  
>> 
>>   Take a look at https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
>> 
>>   Check the binding that OpenMPI is using (by the way, there are much more recent OpenMPI versions, I suggest using them). Run the STREAMS benchmark as indicated on that page.
>> 
>>   Barry
>> 
>> 
>>> On Oct 10, 2023, at 9:27 AM, Gong Yujie <yc17470 at connect.um.edu.mo <mailto:yc17470 at connect.um.edu.mo>> wrote:
>>> 
>>> Dear PETSc developers,
>>> 
>>> I installed OpenMPI3 first and then installed PETSc with that mpi. Currently, I'm facing a scalability issue, in detail, I tested that using OpenMPI to calculate an addition of two distributed arrays and I get a good scalability. The problem is when I calculate the addition of two vectors in PETSc, I don't have any scalability. For the same size of the problem, PETSc costs a lot much time than merely using OpenMPI. 
>>> 
>>> My PETSc version is 3.16.0 and the version of OpenMPI is 3.1.4. Hope you can give me some suggestions.
>>> 
>>> Best Regards,
>>> Yujie

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231010/e978905f/attachment-0001.html>

From thanasis.boutsikakis at corintis.com  Tue Oct 10 16:33:48 2023
From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis)
Date: Tue, 10 Oct 2023 23:33:48 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
	<B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
	<FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>
	<78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com>
Message-ID: <D2E813A6-085A-4CC1-9C97-F2A96219450C@corintis.com>

Hi all,

Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned).

"""Experimenting with PETSc mat-mat multiplication"""

import time

import numpy as np
from colorama import Fore
from firedrake import COMM_SELF, COMM_WORLD
from firedrake.petsc import PETSc
from mpi4py import MPI
from numpy.testing import assert_array_almost_equal

from utilities import Print

nproc = COMM_WORLD.size
rank = COMM_WORLD.rank

def create_petsc_matrix(input_array, sparse=True):
    """Create a PETSc matrix from an input_array

    Args:
        input_array (np array): Input array
        partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
        sparse (bool, optional): Toggle for sparese or dense. Defaults to True.

    Returns:
        PETSc mat: PETSc mpi matrix
    """
    # Check if input_array is 1D and reshape if necessary
    assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
    global_rows, global_cols = input_array.shape
    size = ((None, global_rows), (global_cols, global_cols))

    # Create a sparse or dense matrix based on the 'sparse' argument
    if sparse:
        matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
    else:
        matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
    matrix.setUp()

    local_rows_start, local_rows_end = matrix.getOwnershipRange()

    for counter, i in enumerate(range(local_rows_start, local_rows_end)):
        # Calculate the correct row in the array for the current process
        row_in_array = counter + local_rows_start
        matrix.setValues(
            i, range(global_cols), input_array[row_in_array, :], addv=False
        )

    # Assembly the matrix to compute the final structure
    matrix.assemblyBegin()
    matrix.assemblyEnd()

    return matrix

# --------------------------------------------
# EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
#  A' = Phi.T * A * Phi
# [k x k] <- [k x m] x [m x m] x [m x k]
# --------------------------------------------

m, k = 100, 7
# Generate the random numpy matrices
np.random.seed(0)  # sets the seed to 0
A_np = np.random.randint(low=0, high=6, size=(m, m))
Phi_np = np.random.randint(low=0, high=6, size=(m, k))

# --------------------------------------------
# TEST: Galerking projection of numpy matrices A_np and Phi_np
# --------------------------------------------
Aprime_np = Phi_np.T @ A_np @ Phi_np
Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
Print(f"{Aprime_np}")

# Create A as an mpi matrix distributed on each process
A = create_petsc_matrix(A_np, sparse=False)

# Create Phi as an mpi matrix distributed on each process
Phi = create_petsc_matrix(Phi_np, sparse=False)

# Create an empty PETSc matrix object to store the result of the PtAP operation.
# This will hold the result A' = Phi.T * A * Phi after the computation.
A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)

# Perform the PtAP (Phi Transpose times A times Phi) operation.
# In mathematical terms, this operation is A' = Phi.T * A * Phi.
# A_prime will store the result of the operation.
A_prime = A.ptap(Phi)

Here is the error

MATRIX mpiaij A [100x100]
Assembled

Partitioning for A:
  Rank 0: Rows [0, 34)
  Rank 1: Rows [34, 67)
  Rank 2: Rows [67, 100)

MATRIX mpiaij Phi [100x7]
Assembled

Partitioning for Phi:
  Rank 0: Rows [0, 34)
  Rank 1: Rows [34, 67)
  Rank 2: Rows [67, 100)

Traceback (most recent call last):
  File "/Users/boutsitron/work/galerkin_projection.py", line 87, in <module>
    A_prime = A.ptap(Phi)
              ^^^^^^^^^^^
  File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
petsc4py.PETSc.Error: error code 60
[0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
[0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
[0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
[0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
[0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
[0] Nonconforming object sizes
[0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0

Any thoughts?

Thanks,
Thanos

> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
> 
> This works Pierre. Amazing input, thanks a lot!
> 
>> On 5 Oct 2023, at 14:17, Pierre Jolivet <pierre at joliv.et> wrote:
>> 
>> Not a petsc4py expert here, but you may to try instead:
>> A_prime = A.ptap(Phi)
>> 
>> Thanks,
>> Pierre
>> 
>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>> 
>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth?
>>> 
>>> [0]PETSC ERROR: ------------------------------------------------------------------------
>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>> [0]PETSC ERROR: to get more information on the crash.
>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>> 
>>> """Experimenting with PETSc mat-mat multiplication"""
>>> 
>>> import time
>>> 
>>> import numpy as np
>>> from colorama import Fore
>>> from firedrake import COMM_SELF, COMM_WORLD
>>> from firedrake.petsc import PETSc
>>> from mpi4py import MPI
>>> from numpy.testing import assert_array_almost_equal
>>> 
>>> from utilities import (
>>>     Print,
>>>     create_petsc_matrix,
>>>     print_matrix_partitioning,
>>> )
>>> 
>>> nproc = COMM_WORLD.size
>>> rank = COMM_WORLD.rank
>>> 
>>> # --------------------------------------------
>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>> #  A' = Phi.T * A * Phi
>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>> # --------------------------------------------
>>> 
>>> m, k = 11, 7
>>> # Generate the random numpy matrices
>>> np.random.seed(0)  # sets the seed to 0
>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>> 
>>> # --------------------------------------------
>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>> # --------------------------------------------
>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>> Print(f"{Aprime_np}")
>>> 
>>> # Create A as an mpi matrix distributed on each process
>>> A = create_petsc_matrix(A_np, sparse=False)
>>> 
>>> # Create Phi as an mpi matrix distributed on each process
>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>> 
>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>> 
>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>> # A_prime will store the result of the operation.
>>> Phi.PtAP(A, A_prime)
>>> 
>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et> wrote:
>>>> 
>>>> How about using ptap which will use MatPtAP?
>>>> It will be more efficient (and it will help you bypass the issue).
>>>> 
>>>> Thanks,
>>>> Pierre
>>>> 
>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>>>> 
>>>>> Sorry, forgot function create_petsc_matrix()
>>>>> 
>>>>> def create_petsc_matrix(input_array sparse=True):
>>>>>     """Create a PETSc matrix from an input_array
>>>>> 
>>>>>     Args:
>>>>>         input_array (np array): Input array
>>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>>> 
>>>>>     Returns:
>>>>>         PETSc mat: PETSc matrix
>>>>>     """
>>>>>     # Check if input_array is 1D and reshape if necessary
>>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>>     global_rows, global_cols = input_array.shape
>>>>> 
>>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>>> 
>>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>>     if sparse:
>>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>>     else:
>>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>>     matrix.setUp()
>>>>> 
>>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>>> 
>>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>>         # Calculate the correct row in the array for the current process
>>>>>         row_in_array = counter + local_rows_start
>>>>>         matrix.setValues(
>>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>>         )
>>>>> 
>>>>>     # Assembly the matrix to compute the final structure
>>>>>     matrix.assemblyBegin()
>>>>>     matrix.assemblyEnd()
>>>>> 
>>>>>     return matrix
>>>>> 
>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
>>>>>> 
>>>>>>     Phi.transposeMatMult(A, A1)
>>>>>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
>>>>>> petsc4py.PETSc.Error: error code 56
>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>>>>>> [0] No support for this operation for this object type
>>>>>> [0] Call MatProductCreate() first
>>>>>> 
>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
>>>>>> 
>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>> 
>>>>>> import time
>>>>>> 
>>>>>> import numpy as np
>>>>>> from colorama import Fore
>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>> from firedrake.petsc import PETSc
>>>>>> from mpi4py import MPI
>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>> 
>>>>>> from utilities import (
>>>>>>     Print,
>>>>>>     create_petsc_matrix,
>>>>>> )
>>>>>> 
>>>>>> nproc = COMM_WORLD.size
>>>>>> rank = COMM_WORLD.rank
>>>>>> 
>>>>>> # --------------------------------------------
>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>> #  A' = Phi.T * A * Phi
>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>> # --------------------------------------------
>>>>>> 
>>>>>> m, k = 11, 7
>>>>>> # Generate the random numpy matrices
>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>> 
>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>> A = create_petsc_matrix(A_np)
>>>>>> 
>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>> Phi = create_petsc_matrix(Phi_np)
>>>>>> 
>>>>>> A1 = create_petsc_matrix(np.zeros((k, m)))
>>>>>> 
>>>>>> # Now A1 contains the result of Phi^T * A
>>>>>> Phi.transposeMatMult(A, A1)
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231010/619a9dde/attachment-0001.html>

From erdemguer at proton.me  Tue Oct 10 18:01:13 2023
From: erdemguer at proton.me (erdemguer)
Date: Tue, 10 Oct 2023 23:01:13 +0000
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <CAMYG4G=uZNiME8x1-6bYBDs2vL=NGr82Te6EQGjHyfqe2ee9zw@mail.gmail.com>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4G=uZNiME8x1-6bYBDs2vL=NGr82Te6EQGjHyfqe2ee9zw@mail.gmail.com>
Message-ID: <s87lUk8sbdi_CFN4eW2YBzTuL6lWDusURGA4YrxM6hfepdU9cU_4QBSLZzmc4u2TtelEoTSd-1oukho4kCl_qyhL_n1phV_poV6qyfLa5QM=@proton.me>

Hi,
Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine:

- DMPlexCreateBoxMesh

- DMSetFromOptions
- PetscSectionCreate
- PetscSectionSetNumFields
- PetscSectionSetFieldDof

- PetscSectionSetDof

- PetscSectionSetUp
- DMSetLocalSection
- DMSetAdjacency
- DMPlexDistribute

It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells. But I couldn't figure out how to decide where the ghost/processor boundary cells start. In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing.

Thanks again,
Guer.

------- Original Message -------
On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <petsc-users at mcs.anl.gov> wrote:
>
>> Hi,
>>
>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow:
>>
>> Create a DMPlex mesh from GMSH.
>> Reorder it with DMPlexPermute.
>> Create necessary pre-processing arrays related to the mesh/problem.
>> Create field(s) with multi-dofs.
>> Create residual vectors.
>> Define a function to calculate the residual for each cell and, use SNES.
>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic?
>
> The intention was that there is enough information in the manual to do this.
>
> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand.
>
> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work.
>
> Thanks,
>
> Matt
>
>> Thank you.
>> Guer.
>>
>> Sent with [Proton Mail](https://proton.me/) secure email.
>
> --
>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231010/3cf22af5/attachment.html>

From knepley at gmail.com  Tue Oct 10 19:26:34 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 10 Oct 2023 20:26:34 -0400
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <D2E813A6-085A-4CC1-9C97-F2A96219450C@corintis.com>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
	<B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
	<FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>
	<78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com>
	<D2E813A6-085A-4CC1-9C97-F2A96219450C@corintis.com>
Message-ID: <CAMYG4GnL4AGPa6fagC12eJnQWKenLGdz7Hayi=UqaWL3nALrAg@mail.gmail.com>

On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis <
thanasis.boutsikakis at corintis.com> wrote:

> Hi all,
>
> Revisiting my code and the proposed solution from Pierre, I realized this
> works only in sequential. The reason is that PETSc partitions those
> matrices only row-wise, which leads to an error due to the mismatch between
> number of columns of A (non-partitioned) and the number of rows of Phi
> (partitioned).
>

Are you positive about this? P^T A P is designed to run in this scenario,
so either we have a bug or the diagnosis is wrong.

  Thanks,

     Matt


> """Experimenting with PETSc mat-mat multiplication"""
>
> import time
>
> import numpy as np
> from colorama import Fore
> from firedrake import COMM_SELF, COMM_WORLD
> from firedrake.petsc import PETSc
> from mpi4py import MPI
> from numpy.testing import assert_array_almost_equal
>
> from utilities import Print
>
> nproc = COMM_WORLD.size
> rank = COMM_WORLD.rank
>
> def create_petsc_matrix(input_array, sparse=True):
> """Create a PETSc matrix from an input_array
>
> Args:
> input_array (np array): Input array
> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
> sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>
> Returns:
> PETSc mat: PETSc mpi matrix
> """
> # Check if input_array is 1D and reshape if necessary
> assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
> global_rows, global_cols = input_array.shape
> size = ((None, global_rows), (global_cols, global_cols))
>
> # Create a sparse or dense matrix based on the 'sparse' argument
> if sparse:
> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
> else:
> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
> matrix.setUp()
>
> local_rows_start, local_rows_end = matrix.getOwnershipRange()
>
> for counter, i in enumerate(range(local_rows_start, local_rows_end)):
> # Calculate the correct row in the array for the current process
> row_in_array = counter + local_rows_start
> matrix.setValues(
> i, range(global_cols), input_array[row_in_array, :], addv=False
> )
>
> # Assembly the matrix to compute the final structure
> matrix.assemblyBegin()
> matrix.assemblyEnd()
>
> return matrix
>
> # --------------------------------------------
> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc
> matrix Phi
> # A' = Phi.T * A * Phi
> # [k x k] <- [k x m] x [m x m] x [m x k]
> # --------------------------------------------
>
> m, k = 100, 7
> # Generate the random numpy matrices
> np.random.seed(0) # sets the seed to 0
> A_np = np.random.randint(low=0, high=6, size=(m, m))
> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>
> # --------------------------------------------
> # TEST: Galerking projection of numpy matrices A_np and Phi_np
> # --------------------------------------------
> Aprime_np = Phi_np.T @ A_np @ Phi_np
> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
> Print(f"{Aprime_np}")
>
> # Create A as an mpi matrix distributed on each process
> A = create_petsc_matrix(A_np, sparse=False)
>
> # Create Phi as an mpi matrix distributed on each process
> Phi = create_petsc_matrix(Phi_np, sparse=False)
>
> # Create an empty PETSc matrix object to store the result of the PtAP
> operation.
> # This will hold the result A' = Phi.T * A * Phi after the computation.
> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>
> # Perform the PtAP (Phi Transpose times A times Phi) operation.
> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
> # A_prime will store the result of the operation.
> A_prime = A.ptap(Phi)
>
> Here is the error
>
> MATRIX mpiaij A [100x100]
> Assembled
>
> Partitioning for A:
>   Rank 0: Rows [0, 34)
>   Rank 1: Rows [34, 67)
>   Rank 2: Rows [67, 100)
>
> MATRIX mpiaij Phi [100x7]
> Assembled
>
> Partitioning for Phi:
>   Rank 0: Rows [0, 34)
>   Rank 1: Rows [34, 67)
>   Rank 2: Rows [67, 100)
>
> Traceback (most recent call last):
>   File "/Users/boutsitron/work/galerkin_projection.py", line 87, in
> <module>
>     A_prime = A.ptap(Phi)
>               ^^^^^^^^^^^
>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
> petsc4py.PETSc.Error: error code 60
> [0] MatPtAP() at
> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
> [0] MatProductSetFromOptions() at
> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
> [0] MatProductSetFromOptions_Private() at
> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
> [0] MatProductSetFromOptions_MPIAIJ() at
> /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at
> /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
> [0] Nonconforming object sizes
> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
> Abort(1) on node 0 (rank 0 in comm 496): application called
> MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
>
>
> Any thoughts?
>
> Thanks,
> Thanos
>
> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis <
> thanasis.boutsikakis at corintis.com> wrote:
>
> This works Pierre. Amazing input, thanks a lot!
>
> On 5 Oct 2023, at 14:17, Pierre Jolivet <pierre at joliv.et> wrote:
>
> Not a petsc4py expert here, but you may to try instead:
> A_prime = A.ptap(Phi)
>
> Thanks,
> Pierre
>
> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis <
> thanasis.boutsikakis at corintis.com> wrote:
>
> Thanks Pierre! So I tried this and got a segmentation fault. Is this
> supposed to work right off the bat or am I missing sth?
>
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and
> https://petsc.org/release/faq/
> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
> run
> [0]PETSC ERROR: to get more information on the crash.
> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is
> causing the crash.
> Abort(59) on node 0 (rank 0 in comm 0): application called
> MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>
> """Experimenting with PETSc mat-mat multiplication"""
>
> import time
>
> import numpy as np
> from colorama import Fore
> from firedrake import COMM_SELF, COMM_WORLD
> from firedrake.petsc import PETSc
> from mpi4py import MPI
> from numpy.testing import assert_array_almost_equal
>
> from utilities import (
> Print,
> create_petsc_matrix,
> print_matrix_partitioning,
> )
>
> nproc = COMM_WORLD.size
> rank = COMM_WORLD.rank
>
> # --------------------------------------------
> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc
> matrix Phi
> # A' = Phi.T * A * Phi
> # [k x k] <- [k x m] x [m x m] x [m x k]
> # --------------------------------------------
>
> m, k = 11, 7
> # Generate the random numpy matrices
> np.random.seed(0) # sets the seed to 0
> A_np = np.random.randint(low=0, high=6, size=(m, m))
> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>
> # --------------------------------------------
> # TEST: Galerking projection of numpy matrices A_np and Phi_np
> # --------------------------------------------
> Aprime_np = Phi_np.T @ A_np @ Phi_np
> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
> Print(f"{Aprime_np}")
>
> # Create A as an mpi matrix distributed on each process
> A = create_petsc_matrix(A_np, sparse=False)
>
> # Create Phi as an mpi matrix distributed on each process
> Phi = create_petsc_matrix(Phi_np, sparse=False)
>
> # Create an empty PETSc matrix object to store the result of the PtAP
> operation.
> # This will hold the result A' = Phi.T * A * Phi after the computation.
> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>
> # Perform the PtAP (Phi Transpose times A times Phi) operation.
> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
> # A_prime will store the result of the operation.
> Phi.PtAP(A, A_prime)
>
> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et> wrote:
>
> How about using ptap which will use MatPtAP?
> It will be more efficient (and it will help you bypass the issue).
>
> Thanks,
> Pierre
>
> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <
> thanasis.boutsikakis at corintis.com> wrote:
>
> Sorry, forgot function create_petsc_matrix()
>
> def create_petsc_matrix(input_array sparse=True):
> """Create a PETSc matrix from an input_array
>
> Args:
> input_array (np array): Input array
> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
> sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>
> Returns:
> PETSc mat: PETSc matrix
> """
> # Check if input_array is 1D and reshape if necessary
> assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
> global_rows, global_cols = input_array.shape
>
> size = ((None, global_rows), (global_cols, global_cols))
>
> # Create a sparse or dense matrix based on the 'sparse' argument
> if sparse:
> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
> else:
> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
> matrix.setUp()
>
> local_rows_start, local_rows_end = matrix.getOwnershipRange()
>
> for counter, i in enumerate(range(local_rows_start, local_rows_end)):
> # Calculate the correct row in the array for the current process
> row_in_array = counter + local_rows_start
> matrix.setValues(
> i, range(global_cols), input_array[row_in_array, :], addv=False
> )
>
> # Assembly the matrix to compute the final structure
> matrix.assemblyBegin()
> matrix.assemblyEnd()
>
> return matrix
>
> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <
> thanasis.boutsikakis at corintis.com> wrote:
>
> Hi everyone,
>
> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A,
> A1) work. The error is
>
>     Phi.transposeMatMult(A, A1)
>   File "petsc4py/PETSc/Mat.pyx", line 1514, in
> petsc4py.PETSc.Mat.transposeMatMult
> petsc4py.PETSc.Error: error code 56
> [0] MatTransposeMatMult() at
> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
> [0] MatProduct_Private() at
> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
> [0] No support for this operation for this object type
> [0] Call MatProductCreate() first
>
> Do you know if these exposed to petsc4py or maybe there is another way? I
> cannot get the MFE to work (neither in sequential nor in parallel)
>
> """Experimenting with PETSc mat-mat multiplication"""
>
> import time
>
> import numpy as np
> from colorama import Fore
> from firedrake import COMM_SELF, COMM_WORLD
> from firedrake.petsc import PETSc
> from mpi4py import MPI
> from numpy.testing import assert_array_almost_equal
>
> from utilities import (
> Print,
> create_petsc_matrix,
> )
>
> nproc = COMM_WORLD.size
> rank = COMM_WORLD.rank
>
> # --------------------------------------------
> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc
> matrix Phi
> # A' = Phi.T * A * Phi
> # [k x k] <- [k x m] x [m x m] x [m x k]
> # --------------------------------------------
>
> m, k = 11, 7
> # Generate the random numpy matrices
> np.random.seed(0) # sets the seed to 0
> A_np = np.random.randint(low=0, high=6, size=(m, m))
> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>
> # Create A as an mpi matrix distributed on each process
> A = create_petsc_matrix(A_np)
>
> # Create Phi as an mpi matrix distributed on each process
> Phi = create_petsc_matrix(Phi_np)
>
> A1 = create_petsc_matrix(np.zeros((k, m)))
>
> # Now A1 contains the result of Phi^T * A
> Phi.transposeMatMult(A, A1)
>
>
>
>
>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231010/f97044e1/attachment-0001.html>

From knepley at gmail.com  Tue Oct 10 19:33:18 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 10 Oct 2023 20:33:18 -0400
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <s87lUk8sbdi_CFN4eW2YBzTuL6lWDusURGA4YrxM6hfepdU9cU_4QBSLZzmc4u2TtelEoTSd-1oukho4kCl_qyhL_n1phV_poV6qyfLa5QM=@proton.me>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4G=uZNiME8x1-6bYBDs2vL=NGr82Te6EQGjHyfqe2ee9zw@mail.gmail.com>
	<s87lUk8sbdi_CFN4eW2YBzTuL6lWDusURGA4YrxM6hfepdU9cU_4QBSLZzmc4u2TtelEoTSd-1oukho4kCl_qyhL_n1phV_poV6qyfLa5QM=@proton.me>
Message-ID: <CAMYG4GmQjoWw9+dDNA=8=ROhQec-sgBd9pPfA0rVayEudTN97A@mail.gmail.com>

On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:

>
> Hi,
> Sorry for my late response. I tried with your suggestions and I think I
> made a progress. But I still got issues. Let me explain my latest mesh
> routine:
>
>
>    1. DMPlexCreateBoxMesh
>    2. DMSetFromOptions
>    3. PetscSectionCreate
>    4. PetscSectionSetNumFields
>    5. PetscSectionSetFieldDof
>    6. PetscSectionSetDof
>    7. PetscSectionSetUp
>    8. DMSetLocalSection
>    9. DMSetAdjacency
>    10. DMPlexDistribute
>
>
> It's still not working but it's promising, if I call DMPlexGetDepthStratum
> for cells, I can see that after distribution processors have more cells.
>

Please send the output of DMPlexView() for each incarnation of the mesh.
What I do is put

  DMViewFromOptions(dm, NULL, "-dm1_view")

with a different string after each call.


> But I couldn't figure out how to decide where the ghost/processor boundary
> cells start.
>

Please send the actual code because the above is not specific enough. For
example, you will not have
"ghost cells" unless you partition with overlap. This is because by default
cells are the partitioned quantity,
so each process gets a unique set.

  Thanks,

      Matt


> In older mails I saw there is a function DMPlexGetHybridBounds but I
> think that function is deprecated. I tried to use,
> DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm getting -1
> as cEndInterior before and after distribution. I tried it for
> DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also
> tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but
> nothing changed. I think I can calculate the ghost cell indices using
> cStart/cEnd before & after distribution but I think there is a better way
> I'm currently missing.
>
> Thanks again,
> Guer.
>
> ------- Original Message -------
> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <
> knepley at gmail.com> wrote:
>
> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> Hi,
>>
>> I am currently using DMPlex in my code. It runs serially at the moment,
>> but I'm interested in adding parallel options. Here is my workflow:
>>
>> Create a DMPlex mesh from GMSH.
>> Reorder it with DMPlexPermute.
>> Create necessary pre-processing arrays related to the mesh/problem.
>> Create field(s) with multi-dofs.
>> Create residual vectors.
>> Define a function to calculate the residual for each cell and, use SNES.
>> As you can see, I'm not using FV or FE structures (most examples do).
>> Now, I'm trying to implement this in parallel using a similar approach.
>> However, I'm struggling to understand how to create corresponding vectors
>> and how to obtain index sets for each processor. Is there a tutorial or
>> paper that covers this topic?
>>
>
> The intention was that there is enough information in the manual to do
> this.
>
> Using PetscFE/PetscFV is not required. However, I strongly encourage you
> to use PetscSection. Without this, it would be incredibly hard to do what
> you want. Once the DM has a Section, it can do things like automatically
> create vectors and matrices for you. It can redistribute them, subset them,
> etc. The Section describes how dofs are assigned to pieces of the mesh
> (mesh points). This is in the manual, and there are a few examples that do
> it by hand.
>
> So I suggest changing your code to use PetscSection, and then letting us
> know if things still do not work.
>
> Thanks,
>
> Matt
>
>> Thank you.
>> Guer.
>>
>> Sent with Proton Mail <https://proton.me/> secure email.
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231010/85943b92/attachment.html>

From mfadams at lbl.gov  Tue Oct 10 19:42:56 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 10 Oct 2023 20:42:56 -0400
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <CAMYG4GnL4AGPa6fagC12eJnQWKenLGdz7Hayi=UqaWL3nALrAg@mail.gmail.com>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
	<B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
	<FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>
	<78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com>
	<D2E813A6-085A-4CC1-9C97-F2A96219450C@corintis.com>
	<CAMYG4GnL4AGPa6fagC12eJnQWKenLGdz7Hayi=UqaWL3nALrAg@mail.gmail.com>
Message-ID: <CADOhEh6+DZVUGF01nuBv1xFKHtu3YBmELzeu9O6NUzYQtHBuCg@mail.gmail.com>

This looks like a false positive or there is some subtle bug here that we
are not seeing.
Could this be the first time parallel PtAP has been used (and reported) in
petsc4py?

Mark

On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis <
> thanasis.boutsikakis at corintis.com> wrote:
>
>> Hi all,
>>
>> Revisiting my code and the proposed solution from Pierre, I realized this
>> works only in sequential. The reason is that PETSc partitions those
>> matrices only row-wise, which leads to an error due to the mismatch between
>> number of columns of A (non-partitioned) and the number of rows of Phi
>> (partitioned).
>>
>
> Are you positive about this? P^T A P is designed to run in this scenario,
> so either we have a bug or the diagnosis is wrong.
>
>   Thanks,
>
>      Matt
>
>
>> """Experimenting with PETSc mat-mat multiplication"""
>>
>> import time
>>
>> import numpy as np
>> from colorama import Fore
>> from firedrake import COMM_SELF, COMM_WORLD
>> from firedrake.petsc import PETSc
>> from mpi4py import MPI
>> from numpy.testing import assert_array_almost_equal
>>
>> from utilities import Print
>>
>> nproc = COMM_WORLD.size
>> rank = COMM_WORLD.rank
>>
>> def create_petsc_matrix(input_array, sparse=True):
>> """Create a PETSc matrix from an input_array
>>
>> Args:
>> input_array (np array): Input array
>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>
>> Returns:
>> PETSc mat: PETSc mpi matrix
>> """
>> # Check if input_array is 1D and reshape if necessary
>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>> global_rows, global_cols = input_array.shape
>> size = ((None, global_rows), (global_cols, global_cols))
>>
>> # Create a sparse or dense matrix based on the 'sparse' argument
>> if sparse:
>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>> else:
>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>> matrix.setUp()
>>
>> local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>
>> for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>> # Calculate the correct row in the array for the current process
>> row_in_array = counter + local_rows_start
>> matrix.setValues(
>> i, range(global_cols), input_array[row_in_array, :], addv=False
>> )
>>
>> # Assembly the matrix to compute the final structure
>> matrix.assemblyBegin()
>> matrix.assemblyEnd()
>>
>> return matrix
>>
>> # --------------------------------------------
>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc
>> matrix Phi
>> # A' = Phi.T * A * Phi
>> # [k x k] <- [k x m] x [m x m] x [m x k]
>> # --------------------------------------------
>>
>> m, k = 100, 7
>> # Generate the random numpy matrices
>> np.random.seed(0) # sets the seed to 0
>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>
>> # --------------------------------------------
>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>> # --------------------------------------------
>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>> Print(f"{Aprime_np}")
>>
>> # Create A as an mpi matrix distributed on each process
>> A = create_petsc_matrix(A_np, sparse=False)
>>
>> # Create Phi as an mpi matrix distributed on each process
>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>
>> # Create an empty PETSc matrix object to store the result of the PtAP
>> operation.
>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>
>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>> # A_prime will store the result of the operation.
>> A_prime = A.ptap(Phi)
>>
>> Here is the error
>>
>> MATRIX mpiaij A [100x100]
>> Assembled
>>
>> Partitioning for A:
>>   Rank 0: Rows [0, 34)
>>   Rank 1: Rows [34, 67)
>>   Rank 2: Rows [67, 100)
>>
>> MATRIX mpiaij Phi [100x7]
>> Assembled
>>
>> Partitioning for Phi:
>>   Rank 0: Rows [0, 34)
>>   Rank 1: Rows [34, 67)
>>   Rank 2: Rows [67, 100)
>>
>> Traceback (most recent call last):
>>   File "/Users/boutsitron/work/galerkin_projection.py", line 87, in
>> <module>
>>     A_prime = A.ptap(Phi)
>>               ^^^^^^^^^^^
>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>> petsc4py.PETSc.Error: error code 60
>> [0] MatPtAP() at
>> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>> [0] MatProductSetFromOptions() at
>> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>> [0] MatProductSetFromOptions_Private() at
>> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>> [0] MatProductSetFromOptions_MPIAIJ() at
>> /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at
>> /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>> [0] Nonconforming object sizes
>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
>> Abort(1) on node 0 (rank 0 in comm 496): application called
>> MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
>>
>>
>> Any thoughts?
>>
>> Thanks,
>> Thanos
>>
>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis <
>> thanasis.boutsikakis at corintis.com> wrote:
>>
>> This works Pierre. Amazing input, thanks a lot!
>>
>> On 5 Oct 2023, at 14:17, Pierre Jolivet <pierre at joliv.et> wrote:
>>
>> Not a petsc4py expert here, but you may to try instead:
>> A_prime = A.ptap(Phi)
>>
>> Thanks,
>> Pierre
>>
>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis <
>> thanasis.boutsikakis at corintis.com> wrote:
>>
>> Thanks Pierre! So I tried this and got a segmentation fault. Is this
>> supposed to work right off the bat or am I missing sth?
>>
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>> probably memory access out of range
>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and
>> https://petsc.org/release/faq/
>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
>> and run
>> [0]PETSC ERROR: to get more information on the crash.
>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is
>> causing the crash.
>> Abort(59) on node 0 (rank 0 in comm 0): application called
>> MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>
>> """Experimenting with PETSc mat-mat multiplication"""
>>
>> import time
>>
>> import numpy as np
>> from colorama import Fore
>> from firedrake import COMM_SELF, COMM_WORLD
>> from firedrake.petsc import PETSc
>> from mpi4py import MPI
>> from numpy.testing import assert_array_almost_equal
>>
>> from utilities import (
>> Print,
>> create_petsc_matrix,
>> print_matrix_partitioning,
>> )
>>
>> nproc = COMM_WORLD.size
>> rank = COMM_WORLD.rank
>>
>> # --------------------------------------------
>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc
>> matrix Phi
>> # A' = Phi.T * A * Phi
>> # [k x k] <- [k x m] x [m x m] x [m x k]
>> # --------------------------------------------
>>
>> m, k = 11, 7
>> # Generate the random numpy matrices
>> np.random.seed(0) # sets the seed to 0
>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>
>> # --------------------------------------------
>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>> # --------------------------------------------
>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>> Print(f"{Aprime_np}")
>>
>> # Create A as an mpi matrix distributed on each process
>> A = create_petsc_matrix(A_np, sparse=False)
>>
>> # Create Phi as an mpi matrix distributed on each process
>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>
>> # Create an empty PETSc matrix object to store the result of the PtAP
>> operation.
>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>
>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>> # A_prime will store the result of the operation.
>> Phi.PtAP(A, A_prime)
>>
>> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et> wrote:
>>
>> How about using ptap which will use MatPtAP?
>> It will be more efficient (and it will help you bypass the issue).
>>
>> Thanks,
>> Pierre
>>
>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <
>> thanasis.boutsikakis at corintis.com> wrote:
>>
>> Sorry, forgot function create_petsc_matrix()
>>
>> def create_petsc_matrix(input_array sparse=True):
>> """Create a PETSc matrix from an input_array
>>
>> Args:
>> input_array (np array): Input array
>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>
>> Returns:
>> PETSc mat: PETSc matrix
>> """
>> # Check if input_array is 1D and reshape if necessary
>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>> global_rows, global_cols = input_array.shape
>>
>> size = ((None, global_rows), (global_cols, global_cols))
>>
>> # Create a sparse or dense matrix based on the 'sparse' argument
>> if sparse:
>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>> else:
>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>> matrix.setUp()
>>
>> local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>
>> for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>> # Calculate the correct row in the array for the current process
>> row_in_array = counter + local_rows_start
>> matrix.setValues(
>> i, range(global_cols), input_array[row_in_array, :], addv=False
>> )
>>
>> # Assembly the matrix to compute the final structure
>> matrix.assemblyBegin()
>> matrix.assemblyEnd()
>>
>> return matrix
>>
>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <
>> thanasis.boutsikakis at corintis.com> wrote:
>>
>> Hi everyone,
>>
>> I am trying a Galerkin projection (see MFE below) and I cannot get the
>> Phi.transposeMatMult(A, A1) work. The error is
>>
>>     Phi.transposeMatMult(A, A1)
>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in
>> petsc4py.PETSc.Mat.transposeMatMult
>> petsc4py.PETSc.Error: error code 56
>> [0] MatTransposeMatMult() at
>> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>> [0] MatProduct_Private() at
>> /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>> [0] No support for this operation for this object type
>> [0] Call MatProductCreate() first
>>
>> Do you know if these exposed to petsc4py or maybe there is another way? I
>> cannot get the MFE to work (neither in sequential nor in parallel)
>>
>> """Experimenting with PETSc mat-mat multiplication"""
>>
>> import time
>>
>> import numpy as np
>> from colorama import Fore
>> from firedrake import COMM_SELF, COMM_WORLD
>> from firedrake.petsc import PETSc
>> from mpi4py import MPI
>> from numpy.testing import assert_array_almost_equal
>>
>> from utilities import (
>> Print,
>> create_petsc_matrix,
>> )
>>
>> nproc = COMM_WORLD.size
>> rank = COMM_WORLD.rank
>>
>> # --------------------------------------------
>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc
>> matrix Phi
>> # A' = Phi.T * A * Phi
>> # [k x k] <- [k x m] x [m x m] x [m x k]
>> # --------------------------------------------
>>
>> m, k = 11, 7
>> # Generate the random numpy matrices
>> np.random.seed(0) # sets the seed to 0
>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>
>> # Create A as an mpi matrix distributed on each process
>> A = create_petsc_matrix(A_np)
>>
>> # Create Phi as an mpi matrix distributed on each process
>> Phi = create_petsc_matrix(Phi_np)
>>
>> A1 = create_petsc_matrix(np.zeros((k, m)))
>>
>> # Now A1 contains the result of Phi^T * A
>> Phi.transposeMatMult(A, A1)
>>
>>
>>
>>
>>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231010/d35bdb30/attachment-0001.html>

From bldenton at buffalo.edu  Tue Oct 10 20:34:16 2023
From: bldenton at buffalo.edu (Brandon Denton)
Date: Wed, 11 Oct 2023 01:34:16 +0000
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
Message-ID: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>

Good Evening,

I am looking to implement a form of Navier-Stokes with SUPG Stabilization and shock capturing using PETSc's FEM infrastructure. In this implementation, I need access to the cell's shape function gradients and natural coordinate gradients for calculations within the point-wise residual calculations. How do I get these quantities at the quadrature points? The signatures for fo and f1 don't seem to contain this information.

Thank you in advance for your time.
Brandon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/aea45c88/attachment.html>

From jed at jedbrown.org  Tue Oct 10 21:18:07 2023
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 10 Oct 2023 20:18:07 -0600
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
Message-ID: <401ffc8a-38ec-4a30-a26d-8c8028ccfcca@app.fastmail.com>

Do you want to write a new code using only PETSc or would you be up for collaborating on ceed-fluids, which is a high-performance compressible SUPG solver based on DMPlex with good GPU support? It uses the metric to compute covariant length for stabilization. We have YZ? shock capturing, though it hasn't been tested much beyond shock tube experiments. (Most of our work has been low Mach.)

https://libceed.org/en/latest/examples/fluids/
https://github.com/CEED/libCEED/blob/main/examples/fluids/qfunctions/stabilization.h#L76


On Tue, Oct 10, 2023, at 7:34 PM, Brandon Denton via petsc-users wrote:
> Good Evening,
> 
> I am looking to implement a form of Navier-Stokes with SUPG Stabilization and shock capturing using PETSc's FEM infrastructure. In this implementation, I need access to the cell's shape function gradients and natural coordinate gradients for calculations within the point-wise residual calculations. How do I get these quantities at the quadrature points? The signatures for fo and f1 don't seem to contain this information.
> 
> Thank you in advance for your time.
> Brandon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231010/28f2ecc2/attachment.html>

From bldenton at buffalo.edu  Tue Oct 10 22:54:11 2023
From: bldenton at buffalo.edu (Brandon Denton)
Date: Wed, 11 Oct 2023 03:54:11 +0000
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <401ffc8a-38ec-4a30-a26d-8c8028ccfcca@app.fastmail.com>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<401ffc8a-38ec-4a30-a26d-8c8028ccfcca@app.fastmail.com>
Message-ID: <PH7PR15MB60589D4C76CF58BFFA15104FC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>

My initial plan was to write a new code using only PETSc. However, I don't see how to do what I want within the point-wise residual function. Am I missing something?

Yes. I would be interested in collaborating on the ceed-fluids. I took a quick look at the links you provided and it looks interesting. I'll warn you though. I'm a Mechanical Engineer by trade/training. The calculus and programming sometimes take me a little while to wrap my head around. Let me know how I can help. In the meantime, I'll continue to review the information you sent over.
________________________________
From: Jed Brown <jed at jedbrown.org>
Sent: Tuesday, October 10, 2023 10:18 PM
To: Brandon Denton <bldenton at buffalo.edu>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization

Do you want to write a new code using only PETSc or would you be up for collaborating on ceed-fluids, which is a high-performance compressible SUPG solver based on DMPlex with good GPU support? It uses the metric to compute covariant length for stabilization. We have YZ? shock capturing, though it hasn't been tested much beyond shock tube experiments. (Most of our work has been low Mach.)

https://libceed.org/en/latest/examples/fluids/
https://github.com/CEED/libCEED/blob/main/examples/fluids/qfunctions/stabilization.h#L76


On Tue, Oct 10, 2023, at 7:34 PM, Brandon Denton via petsc-users wrote:
Good Evening,

I am looking to implement a form of Navier-Stokes with SUPG Stabilization and shock capturing using PETSc's FEM infrastructure. In this implementation, I need access to the cell's shape function gradients and natural coordinate gradients for calculations within the point-wise residual calculations. How do I get these quantities at the quadrature points? The signatures for fo and f1 don't seem to contain this information.

Thank you in advance for your time.
Brandon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/c33ed43d/attachment-0001.html>

From pierre at joliv.et  Wed Oct 11 00:18:10 2023
From: pierre at joliv.et (Pierre Jolivet)
Date: Wed, 11 Oct 2023 07:18:10 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <CADOhEh6+DZVUGF01nuBv1xFKHtu3YBmELzeu9O6NUzYQtHBuCg@mail.gmail.com>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
	<B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
	<FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>
	<78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com>
	<D2E813A6-085A-4CC1-9C97-F2A96219450C@corintis.com>
	<CAMYG4GnL4AGPa6fagC12eJnQWKenLGdz7Hayi=UqaWL3nALrAg@mail.gmail.com>
	<CADOhEh6+DZVUGF01nuBv1xFKHtu3YBmELzeu9O6NUzYQtHBuCg@mail.gmail.com>
Message-ID: <3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et>

I disagree with what Mark and Matt are saying: your code is fine, the error message is fine, petsc4py is fine (in this instance).
It?s not a typical use case of MatPtAP(), which is mostly designed for MatAIJ, not MatDense.
On the one hand, in the MatDense case, indeed there will be a mismatch between the number of columns of A and the number of rows of P, as written in the error message.
On the other hand, there is not much to optimize when computing C = P? A P with everything being dense.
I would just write this as B = A P and then C = P? B (but then you may face the same issue as initially reported, please let us know then).

Thanks,
Pierre

> On 11 Oct 2023, at 2:42?AM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> This looks like a false positive or there is some subtle bug here that we are not seeing.
> Could this be the first time parallel PtAP has been used (and reported) in petsc4py?
> 
> Mark
> 
> On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>> On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>> Hi all,
>>> 
>>> Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned).
>> 
>> Are you positive about this? P^T A P is designed to run in this scenario, so either we have a bug or the diagnosis is wrong.
>> 
>>   Thanks,
>> 
>>      Matt
>>  
>>> """Experimenting with PETSc mat-mat multiplication"""
>>> 
>>> import time
>>> 
>>> import numpy as np
>>> from colorama import Fore
>>> from firedrake import COMM_SELF, COMM_WORLD
>>> from firedrake.petsc import PETSc
>>> from mpi4py import MPI
>>> from numpy.testing import assert_array_almost_equal
>>> 
>>> from utilities import Print
>>> 
>>> nproc = COMM_WORLD.size
>>> rank = COMM_WORLD.rank
>>> 
>>> def create_petsc_matrix(input_array, sparse=True):
>>>     """Create a PETSc matrix from an input_array
>>> 
>>>     Args:
>>>         input_array (np array): Input array
>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>> 
>>>     Returns:
>>>         PETSc mat: PETSc mpi matrix
>>>     """
>>>     # Check if input_array is 1D and reshape if necessary
>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>     global_rows, global_cols = input_array.shape
>>>     size = ((None, global_rows), (global_cols, global_cols))
>>> 
>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>     if sparse:
>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>     else:
>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>     matrix.setUp()
>>> 
>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>> 
>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>         # Calculate the correct row in the array for the current process
>>>         row_in_array = counter + local_rows_start
>>>         matrix.setValues(
>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>         )
>>> 
>>>     # Assembly the matrix to compute the final structure
>>>     matrix.assemblyBegin()
>>>     matrix.assemblyEnd()
>>> 
>>>     return matrix
>>> 
>>> # --------------------------------------------
>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>> #  A' = Phi.T * A * Phi
>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>> # --------------------------------------------
>>> 
>>> m, k = 100, 7
>>> # Generate the random numpy matrices
>>> np.random.seed(0)  # sets the seed to 0
>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>> 
>>> # --------------------------------------------
>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>> # --------------------------------------------
>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>> Print(f"{Aprime_np}")
>>> 
>>> # Create A as an mpi matrix distributed on each process
>>> A = create_petsc_matrix(A_np, sparse=False)
>>> 
>>> # Create Phi as an mpi matrix distributed on each process
>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>> 
>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>> 
>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>> # A_prime will store the result of the operation.
>>> A_prime = A.ptap(Phi)
>>> 
>>> Here is the error
>>> 
>>> MATRIX mpiaij A [100x100]
>>> Assembled
>>> 
>>> Partitioning for A:
>>>   Rank 0: Rows [0, 34)
>>>   Rank 1: Rows [34, 67)
>>>   Rank 2: Rows [67, 100)
>>> 
>>> MATRIX mpiaij Phi [100x7]
>>> Assembled
>>> 
>>> Partitioning for Phi:
>>>   Rank 0: Rows [0, 34)
>>>   Rank 1: Rows [34, 67)
>>>   Rank 2: Rows [67, 100)
>>> 
>>> Traceback (most recent call last):
>>>   File "/Users/boutsitron/work/galerkin_projection.py", line 87, in <module>
>>>     A_prime = A.ptap(Phi)
>>>               ^^^^^^^^^^^
>>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>>> petsc4py.PETSc.Error: error code 60
>>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>>> [0] Nonconforming object sizes
>>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
>>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
>>> 
>>> Any thoughts?
>>> 
>>> Thanks,
>>> Thanos
>>> 
>>>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>> 
>>>> This works Pierre. Amazing input, thanks a lot!
>>>> 
>>>>> On 5 Oct 2023, at 14:17, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>> 
>>>>> Not a petsc4py expert here, but you may to try instead:
>>>>> A_prime = A.ptap(Phi)
>>>>> 
>>>>> Thanks,
>>>>> Pierre
>>>>> 
>>>>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>> 
>>>>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth?
>>>>>> 
>>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------
>>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
>>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>>>>> [0]PETSC ERROR: to get more information on the crash.
>>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
>>>>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>>>>> 
>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>> 
>>>>>> import time
>>>>>> 
>>>>>> import numpy as np
>>>>>> from colorama import Fore
>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>> from firedrake.petsc import PETSc
>>>>>> from mpi4py import MPI
>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>> 
>>>>>> from utilities import (
>>>>>>     Print,
>>>>>>     create_petsc_matrix,
>>>>>>     print_matrix_partitioning,
>>>>>> )
>>>>>> 
>>>>>> nproc = COMM_WORLD.size
>>>>>> rank = COMM_WORLD.rank
>>>>>> 
>>>>>> # --------------------------------------------
>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>> #  A' = Phi.T * A * Phi
>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>> # --------------------------------------------
>>>>>> 
>>>>>> m, k = 11, 7
>>>>>> # Generate the random numpy matrices
>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>> 
>>>>>> # --------------------------------------------
>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>>>>> # --------------------------------------------
>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>>>>> Print(f"{Aprime_np}")
>>>>>> 
>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>> A = create_petsc_matrix(A_np, sparse=False)
>>>>>> 
>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>>>>> 
>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>>>>> 
>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>>>>> # A_prime will store the result of the operation.
>>>>>> Phi.PtAP(A, A_prime)
>>>>>> 
>>>>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>>>> 
>>>>>>> How about using ptap which will use MatPtAP?
>>>>>>> It will be more efficient (and it will help you bypass the issue).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Pierre
>>>>>>> 
>>>>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>> 
>>>>>>>> Sorry, forgot function create_petsc_matrix()
>>>>>>>> 
>>>>>>>> def create_petsc_matrix(input_array sparse=True):
>>>>>>>>     """Create a PETSc matrix from an input_array
>>>>>>>> 
>>>>>>>>     Args:
>>>>>>>>         input_array (np array): Input array
>>>>>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>>>>>> 
>>>>>>>>     Returns:
>>>>>>>>         PETSc mat: PETSc matrix
>>>>>>>>     """
>>>>>>>>     # Check if input_array is 1D and reshape if necessary
>>>>>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>>>>>     global_rows, global_cols = input_array.shape
>>>>>>>> 
>>>>>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>>>>>> 
>>>>>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>>>>>     if sparse:
>>>>>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>>>>>     else:
>>>>>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>>>>>     matrix.setUp()
>>>>>>>> 
>>>>>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>>>>>> 
>>>>>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>>>>>         # Calculate the correct row in the array for the current process
>>>>>>>>         row_in_array = counter + local_rows_start
>>>>>>>>         matrix.setValues(
>>>>>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>>>>>         )
>>>>>>>> 
>>>>>>>>     # Assembly the matrix to compute the final structure
>>>>>>>>     matrix.assemblyBegin()
>>>>>>>>     matrix.assemblyEnd()
>>>>>>>> 
>>>>>>>>     return matrix
>>>>>>>> 
>>>>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi everyone,
>>>>>>>>> 
>>>>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
>>>>>>>>> 
>>>>>>>>>     Phi.transposeMatMult(A, A1)
>>>>>>>>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
>>>>>>>>> petsc4py.PETSc.Error: error code 56
>>>>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>>>>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>>>>>>>>> [0] No support for this operation for this object type
>>>>>>>>> [0] Call MatProductCreate() first
>>>>>>>>> 
>>>>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
>>>>>>>>> 
>>>>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>>>>> 
>>>>>>>>> import time
>>>>>>>>> 
>>>>>>>>> import numpy as np
>>>>>>>>> from colorama import Fore
>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>>>>> from firedrake.petsc import PETSc
>>>>>>>>> from mpi4py import MPI
>>>>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>>>>> 
>>>>>>>>> from utilities import (
>>>>>>>>>     Print,
>>>>>>>>>     create_petsc_matrix,
>>>>>>>>> )
>>>>>>>>> 
>>>>>>>>> nproc = COMM_WORLD.size
>>>>>>>>> rank = COMM_WORLD.rank
>>>>>>>>> 
>>>>>>>>> # --------------------------------------------
>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>>>>> #  A' = Phi.T * A * Phi
>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>>>>> # --------------------------------------------
>>>>>>>>> 
>>>>>>>>> m, k = 11, 7
>>>>>>>>> # Generate the random numpy matrices
>>>>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>>>>> 
>>>>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>>>>> A = create_petsc_matrix(A_np)
>>>>>>>>> 
>>>>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>>>>> Phi = create_petsc_matrix(Phi_np)
>>>>>>>>> 
>>>>>>>>> A1 = create_petsc_matrix(np.zeros((k, m)))
>>>>>>>>> 
>>>>>>>>> # Now A1 contains the result of Phi^T * A
>>>>>>>>> Phi.transposeMatMult(A, A1)
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>> --
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/327a2f93/attachment-0001.html>

From jroman at dsic.upv.es  Wed Oct 11 01:41:22 2023
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Wed, 11 Oct 2023 08:41:22 +0200
Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and
 T'(lambda)
In-Reply-To: <BL0PR05MB4801BD698F33E63C93E55252A2C9A@BL0PR05MB4801.namprd05.prod.outlook.com>
References: <BL0PR05MB480177D5E10088FE9EC11DA4A2CAA@BL0PR05MB4801.namprd05.prod.outlook.com>
	<F1528F80-C92C-43A9-9CB7-75A1D8712100@dsic.upv.es>
	<BL0PR05MB4801BD698F33E63C93E55252A2C9A@BL0PR05MB4801.namprd05.prod.outlook.com>
Message-ID: <89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es>

Kenneth,

The MatDuplicate issue should be fixed in the following MR https://gitlab.com/petsc/petsc/-/merge_requests/6912

Note that the NLEIGS solver internally uses MatDuplicate for creating multiple copies of the shell matrix, each one with its own value of lambda. Hence your implementation of the shell matrix is not appropriate, since you have a single global lambda within the module. I have attempted to write a Fortran example that duplicates the lambda correctly (see the MR), but does not work yet.

Jose


> El 6 oct 2023, a las 22:28, Kenneth C Hall <kenneth.c.hall at duke.edu> escribi?:
> 
> Jose,
>  
> Unfortunately, I was unable to implement the MATOP_DUPLICATE operation in fortran (and I do not know enough c to work in c).  Here is the error message I get:
>  
> [0]PETSC ERROR: #1 MatShellSetOperation_Fortran() at /Users/hall/Documents/Fortran_Codes/Packages/petsc/src/mat/impls/shell/ftn-custom/zshellf.c:283
> [0]PETSC ERROR: #2 src/test_nep.f90:62
>  
> When I look at zshellf.c, MATOP_DUPLICATE is not one of the supported operations. See below.
>  
> Kenneth
>  
>  
> /**                                                                                                                                                                                                                                                                       
>  * Subset of MatOperation that is supported by the Fortran wrappers.                                                                                                                                                                                                      
>  */
> enum FortranMatOperation {
>   FORTRAN_MATOP_MULT               = 0,
>   FORTRAN_MATOP_MULT_ADD           = 1,
>   FORTRAN_MATOP_MULT_TRANSPOSE     = 2,
>   FORTRAN_MATOP_MULT_TRANSPOSE_ADD = 3,
>   FORTRAN_MATOP_SOR                = 4,
>   FORTRAN_MATOP_TRANSPOSE          = 5,
>   FORTRAN_MATOP_GET_DIAGONAL       = 6,
>   FORTRAN_MATOP_DIAGONAL_SCALE     = 7,
>   FORTRAN_MATOP_ZERO_ENTRIES       = 8,
>   FORTRAN_MATOP_AXPY               = 9,
>   FORTRAN_MATOP_SHIFT              = 10,
>   FORTRAN_MATOP_DIAGONAL_SET       = 11,
>   FORTRAN_MATOP_DESTROY            = 12,
>   FORTRAN_MATOP_VIEW               = 13,
>   FORTRAN_MATOP_CREATE_VECS        = 14,
>   FORTRAN_MATOP_GET_DIAGONAL_BLOCK = 15,
>   FORTRAN_MATOP_COPY               = 16,
>   FORTRAN_MATOP_SCALE              = 17,
>   FORTRAN_MATOP_SET_RANDOM         = 18,
>   FORTRAN_MATOP_ASSEMBLY_BEGIN     = 19,
>   FORTRAN_MATOP_ASSEMBLY_END       = 20,
>   FORTRAN_MATOP_SIZE               = 21
> };
>  
>  
> From: Jose E. Roman <jroman at dsic.upv.es>
> Date: Friday, October 6, 2023 at 7:01 AM
> To: Kenneth C Hall <kenneth.c.hall at duke.edu>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda)
> 
> I am getting an error in a different place than you. I started to debug, but don't have much time at the moment.
> Can you try something? Comparing to ex21.c, I see that a difference that may be relevant is the MATOP_DUPLICATE operation. Can you try defining it for your A matrix?
> 
> Note: If you plan to use the NLEIGS solver, there is no need to define the derivative T' so you can skip the call to NEPSetJacobian().
> 
> Jose
> 
> 
> > El 6 oct 2023, a las 0:37, Kenneth C Hall <kenneth.c.hall at duke.edu> escribi?:
> > 
> > Hi all,
> >  
> > I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x.
> >  
> > I am trying to implement in SLEPc/NEP.  To get started, I have defined a much smaller and simpler system of the form
> > A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP.
> >  
> > I have set the problem up using shell matrices (as that is my ultimate goal).  The full code is attached, but here is a smaller snippet of code:
> >  
> > !.... Create matrix-free operators for A and B
> >       PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr))
> >       PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr))
> >       PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr))
> >       PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr))
> >  
> > !.... Create nonlinear eigensolver
> >       PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr))
> >  
> > !.... Set the problem type
> >       PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr))
> > !
> > !.... set the solver type
> >       PetscCall(NEPSetType(nep, NEPNLEIGS, ierr))
> > !
> > !.... Set functions and Jacobians for NEP
> >       PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr))
> >       PetscCall(NEPSetJacobian(nep, B,    MyNEPJacobian, PETSC_NULL_INTEGER, ierr))
> >  
> > The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B.  The NEP viewer and error messages are attached.
> >  
> > Any help on getting this problem properly set up would be greatly appreciated.
> >  
> > Kenneth Hall
> > ATTACHMENTS: 
> > test_nep.f90
> > code_output
> >  
> > <code_output><test_nep.f90>
> 


From Roland.Richter at empa.ch  Wed Oct 11 01:44:48 2023
From: Roland.Richter at empa.ch (Richter, Roland)
Date: Wed, 11 Oct 2023 06:44:48 +0000
Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and
 Intel MPI fails
In-Reply-To: <78e0a665-e6fc-4566-4900-6faa2e593c72@mcs.anl.gov>
References: <c98fe2ad282047bca93568953c6b058e@empa.ch>
	<3CF831A3-F5DC-4055-9F00-FA7DD7242EBB@petsc.dev>
	<78e0a665-e6fc-4566-4900-6faa2e593c72@mcs.anl.gov>
Message-ID: <c928fe50535a40f482a1ddacdb613a95@empa.ch>

Hei,
Thank you very much for the answer! I looked it up, but petsc.org seems to
be a bit unstable here, quite often I can't reach petsc.org. 
Regards,
Roland Richter

-----Urspr?ngliche Nachricht-----
Von: Satish Balay <balay at mcs.anl.gov> 
Gesendet: mandag 9. oktober 2023 17:29
An: Barry Smith <bsmith at petsc.dev>
Cc: Richter, Roland <Roland.Richter at empa.ch>; petsc-users at mcs.anl.gov
Betreff: Re: [petsc-users] Configuration of PETSc with Intel OneAPI and
Intel MPI fails

Will note - OneAPI MPI usage is documented at
https://petsc.org/release/install/install/#mpi

Satish

On Mon, 9 Oct 2023, Barry Smith wrote:

> 
>   Instead of using the mpiicc -cc=icx style use -- with-cc=mpiicc (etc)
and 
> 
> export I_MPI_CC=icx
> export I_MPI_CXX=icpx
> export I_MPI_F90=ifx
> 
> 
> > On Oct 9, 2023, at 8:32 AM, Richter, Roland <Roland.Richter at empa.ch>
wrote:
> > 
> > Hei,
> > I'm currently trying to install PETSc on a server (Ubuntu 22.04) with
Intel MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc
-cc=icx" as C-compiler, as described by
https://stackoverflow.com/a/76362396. Therefore, I adapted the
configure-line as follow:
> >  
> > ./configure --prefix=/media/storage/local_opt/petsc
--with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc
-cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC
-march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true
--with-mpi=true
--with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/
--with-openmp=true --download-hdf5=yes --download-netcdf=yes
--download-chaco=no --download-metis=yes --download-slepc=yes
--download-suitesparse=yes --download-eigen=yes --download-parmetis=yes
--download-ptscotch=yes --download-mumps=yes --download-scalapack=yes
--download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1
--with-boost=1 --with-boost-dir=/media/storage/local_opt/boost
--download-opencascade=yes --with-fftw=1
--with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes
--with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1
--download-muparser=no --download-p4est=yes --download-sowing=y
 es --download-viennalcl=yes --with-zlib --force=1 --with-clean=1
--with-cuda=1
> >  
> > The configuration, however, fails with 
> >  
> > The CMAKE_C_COMPILER:
> >  
> >     mpiicc -cc=icx
> >  
> >   is not a full path and was not found in the PATH
> >  
> > for all additional modules which use a cmake-based configuration
approach (such as OPENCASCADE). How could I solve that problem?
> >  
> > Thank you!
> > Regards,
> > Roland Richter
> > <configure.log>
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 7926 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/ded76bb9/attachment.p7s>

From thanasis.boutsikakis at corintis.com  Wed Oct 11 01:58:18 2023
From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis)
Date: Wed, 11 Oct 2023 08:58:18 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
	<B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
	<FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>
	<78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com>
	<D2E813A6-085A-4CC1-9C97-F2A96219450C@corintis.com>
	<CAMYG4GnL4AGPa6fagC12eJnQWKenLGdz7Hayi=UqaWL3nALrAg@mail.gmail.com>
	<CADOhEh6+DZVUGF01nuBv1xFKHtu3YBmELzeu9O6NUzYQtHBuCg@mail.gmail.com>
	<3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et>
Message-ID: <327E3AAA-1AD0-4051-B977-55420DE24067@corintis.com>

Pierre, I see your point, but my experiment shows that it does not even run due to size mismatch, so I don?t see how being sparse would change things here. There must be some kind of problem with the parallel ptap(), because it does run sequentially. In order to test that, I changed the flags of the matrix creation to sparse=True and ran it again. Here is the code

"""Experimenting with PETSc mat-mat multiplication"""

import numpy as np
from firedrake import COMM_WORLD
from firedrake.petsc import PETSc

from utilities import Print

nproc = COMM_WORLD.size
rank = COMM_WORLD.rank


def create_petsc_matrix(input_array, sparse=True):
    """Create a PETSc matrix from an input_array

    Args:
        input_array (np array): Input array
        partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
        sparse (bool, optional): Toggle for sparese or dense. Defaults to True.

    Returns:
        PETSc mat: PETSc mpi matrix
    """
    # Check if input_array is 1D and reshape if necessary
    assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
    global_rows, global_cols = input_array.shape
    size = ((None, global_rows), (global_cols, global_cols))

    # Create a sparse or dense matrix based on the 'sparse' argument
    if sparse:
        matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
    else:
        matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
    matrix.setUp()

    local_rows_start, local_rows_end = matrix.getOwnershipRange()

    for counter, i in enumerate(range(local_rows_start, local_rows_end)):
        # Calculate the correct row in the array for the current process
        row_in_array = counter + local_rows_start
        matrix.setValues(
            i, range(global_cols), input_array[row_in_array, :], addv=False
        )

    # Assembly the matrix to compute the final structure
    matrix.assemblyBegin()
    matrix.assemblyEnd()

    return matrix


# --------------------------------------------
# EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
#  A' = Phi.T * A * Phi
# [k x k] <- [k x m] x [m x m] x [m x k]
# --------------------------------------------

m, k = 100, 7
# Generate the random numpy matrices
np.random.seed(0)  # sets the seed to 0
A_np = np.random.randint(low=0, high=6, size=(m, m))
Phi_np = np.random.randint(low=0, high=6, size=(m, k))

# --------------------------------------------
# TEST: Galerking projection of numpy matrices A_np and Phi_np
# --------------------------------------------
Aprime_np = Phi_np.T @ A_np @ Phi_np

# Create A as an mpi matrix distributed on each process
A = create_petsc_matrix(A_np, sparse=True)

# Create Phi as an mpi matrix distributed on each process
Phi = create_petsc_matrix(Phi_np, sparse=True)

# Create an empty PETSc matrix object to store the result of the PtAP operation.
# This will hold the result A' = Phi.T * A * Phi after the computation.
A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=True)

# Perform the PtAP (Phi Transpose times A times Phi) operation.
# In mathematical terms, this operation is A' = Phi.T * A * Phi.
# A_prime will store the result of the operation.
A_prime = A.ptap(Phi)

I got

Traceback (most recent call last):
  File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
Traceback (most recent call last):
  File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
Traceback (most recent call last):
  File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
    A_prime = A.ptap(Phi)
    A_prime = A.ptap(Phi)
              ^^^^^^^^^^^
  File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
    A_prime = A.ptap(Phi)
              ^^^^^^^^^^^
              ^^^^^^^^^^^
  File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
  File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
petsc4py.PETSc.Error: error code 60
[0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
[0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
[0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
[0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
[0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
[0] Nonconforming object sizes
[0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
petsc4py.PETSc.Error: error code 60
[1] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
[1] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
[1] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
[1] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
[1] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
[1] Nonconforming object sizes
[1] Matrix local dimensions are incompatible, Acol (100, 200) != Prow (34,67)
Abort(1) on node 1 (rank 1 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 1
petsc4py.PETSc.Error: error code 60
[2] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
[2] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
[2] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
[2] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
[2] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
[2] Nonconforming object sizes
[2] Matrix local dimensions are incompatible, Acol (200, 300) != Prow (67,100)
Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 2

> On 11 Oct 2023, at 07:18, Pierre Jolivet <pierre at joliv.et> wrote:
> 
> I disagree with what Mark and Matt are saying: your code is fine, the error message is fine, petsc4py is fine (in this instance).
> It?s not a typical use case of MatPtAP(), which is mostly designed for MatAIJ, not MatDense.
> On the one hand, in the MatDense case, indeed there will be a mismatch between the number of columns of A and the number of rows of P, as written in the error message.
> On the other hand, there is not much to optimize when computing C = P? A P with everything being dense.
> I would just write this as B = A P and then C = P? B (but then you may face the same issue as initially reported, please let us know then).
> 
> Thanks,
> Pierre
> 
>> On 11 Oct 2023, at 2:42?AM, Mark Adams <mfadams at lbl.gov> wrote:
>> 
>> This looks like a false positive or there is some subtle bug here that we are not seeing.
>> Could this be the first time parallel PtAP has been used (and reported) in petsc4py?
>> 
>> Mark
>> 
>> On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>> On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>> Hi all,
>>>> 
>>>> Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned).
>>> 
>>> Are you positive about this? P^T A P is designed to run in this scenario, so either we have a bug or the diagnosis is wrong.
>>> 
>>>   Thanks,
>>> 
>>>      Matt
>>>  
>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>> 
>>>> import time
>>>> 
>>>> import numpy as np
>>>> from colorama import Fore
>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>> from firedrake.petsc import PETSc
>>>> from mpi4py import MPI
>>>> from numpy.testing import assert_array_almost_equal
>>>> 
>>>> from utilities import Print
>>>> 
>>>> nproc = COMM_WORLD.size
>>>> rank = COMM_WORLD.rank
>>>> 
>>>> def create_petsc_matrix(input_array, sparse=True):
>>>>     """Create a PETSc matrix from an input_array
>>>> 
>>>>     Args:
>>>>         input_array (np array): Input array
>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>> 
>>>>     Returns:
>>>>         PETSc mat: PETSc mpi matrix
>>>>     """
>>>>     # Check if input_array is 1D and reshape if necessary
>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>     global_rows, global_cols = input_array.shape
>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>> 
>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>     if sparse:
>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>     else:
>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>     matrix.setUp()
>>>> 
>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>> 
>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>         # Calculate the correct row in the array for the current process
>>>>         row_in_array = counter + local_rows_start
>>>>         matrix.setValues(
>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>         )
>>>> 
>>>>     # Assembly the matrix to compute the final structure
>>>>     matrix.assemblyBegin()
>>>>     matrix.assemblyEnd()
>>>> 
>>>>     return matrix
>>>> 
>>>> # --------------------------------------------
>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>> #  A' = Phi.T * A * Phi
>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>> # --------------------------------------------
>>>> 
>>>> m, k = 100, 7
>>>> # Generate the random numpy matrices
>>>> np.random.seed(0)  # sets the seed to 0
>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>> 
>>>> # --------------------------------------------
>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>>> # --------------------------------------------
>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>>> Print(f"{Aprime_np}")
>>>> 
>>>> # Create A as an mpi matrix distributed on each process
>>>> A = create_petsc_matrix(A_np, sparse=False)
>>>> 
>>>> # Create Phi as an mpi matrix distributed on each process
>>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>>> 
>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>>> 
>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>>> # A_prime will store the result of the operation.
>>>> A_prime = A.ptap(Phi)
>>>> 
>>>> Here is the error
>>>> 
>>>> MATRIX mpiaij A [100x100]
>>>> Assembled
>>>> 
>>>> Partitioning for A:
>>>>   Rank 0: Rows [0, 34)
>>>>   Rank 1: Rows [34, 67)
>>>>   Rank 2: Rows [67, 100)
>>>> 
>>>> MATRIX mpiaij Phi [100x7]
>>>> Assembled
>>>> 
>>>> Partitioning for Phi:
>>>>   Rank 0: Rows [0, 34)
>>>>   Rank 1: Rows [34, 67)
>>>>   Rank 2: Rows [67, 100)
>>>> 
>>>> Traceback (most recent call last):
>>>>   File "/Users/boutsitron/work/galerkin_projection.py", line 87, in <module>
>>>>     A_prime = A.ptap(Phi)
>>>>               ^^^^^^^^^^^
>>>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>>>> petsc4py.PETSc.Error: error code 60
>>>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>>>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>>>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>>>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>>>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>>>> [0] Nonconforming object sizes
>>>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
>>>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
>>>> 
>>>> Any thoughts?
>>>> 
>>>> Thanks,
>>>> Thanos
>>>> 
>>>>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>> 
>>>>> This works Pierre. Amazing input, thanks a lot!
>>>>> 
>>>>>> On 5 Oct 2023, at 14:17, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>>> 
>>>>>> Not a petsc4py expert here, but you may to try instead:
>>>>>> A_prime = A.ptap(Phi)
>>>>>> 
>>>>>> Thanks,
>>>>>> Pierre
>>>>>> 
>>>>>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>> 
>>>>>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth?
>>>>>>> 
>>>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------
>>>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
>>>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>>>>>> [0]PETSC ERROR: to get more information on the crash.
>>>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
>>>>>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>>>>>> 
>>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>>> 
>>>>>>> import time
>>>>>>> 
>>>>>>> import numpy as np
>>>>>>> from colorama import Fore
>>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>>> from firedrake.petsc import PETSc
>>>>>>> from mpi4py import MPI
>>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>>> 
>>>>>>> from utilities import (
>>>>>>>     Print,
>>>>>>>     create_petsc_matrix,
>>>>>>>     print_matrix_partitioning,
>>>>>>> )
>>>>>>> 
>>>>>>> nproc = COMM_WORLD.size
>>>>>>> rank = COMM_WORLD.rank
>>>>>>> 
>>>>>>> # --------------------------------------------
>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>>> #  A' = Phi.T * A * Phi
>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>>> # --------------------------------------------
>>>>>>> 
>>>>>>> m, k = 11, 7
>>>>>>> # Generate the random numpy matrices
>>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>>> 
>>>>>>> # --------------------------------------------
>>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>>>>>> # --------------------------------------------
>>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>>>>>> Print(f"{Aprime_np}")
>>>>>>> 
>>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>>> A = create_petsc_matrix(A_np, sparse=False)
>>>>>>> 
>>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>>>>>> 
>>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>>>>>> 
>>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>>>>>> # A_prime will store the result of the operation.
>>>>>>> Phi.PtAP(A, A_prime)
>>>>>>> 
>>>>>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>>>>> 
>>>>>>>> How about using ptap which will use MatPtAP?
>>>>>>>> It will be more efficient (and it will help you bypass the issue).
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Pierre
>>>>>>>> 
>>>>>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>>> 
>>>>>>>>> Sorry, forgot function create_petsc_matrix()
>>>>>>>>> 
>>>>>>>>> def create_petsc_matrix(input_array sparse=True):
>>>>>>>>>     """Create a PETSc matrix from an input_array
>>>>>>>>> 
>>>>>>>>>     Args:
>>>>>>>>>         input_array (np array): Input array
>>>>>>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>>>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>>>>>>> 
>>>>>>>>>     Returns:
>>>>>>>>>         PETSc mat: PETSc matrix
>>>>>>>>>     """
>>>>>>>>>     # Check if input_array is 1D and reshape if necessary
>>>>>>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>>>>>>     global_rows, global_cols = input_array.shape
>>>>>>>>> 
>>>>>>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>>>>>>> 
>>>>>>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>>>>>>     if sparse:
>>>>>>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>>>>>>     else:
>>>>>>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>>>>>>     matrix.setUp()
>>>>>>>>> 
>>>>>>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>>>>>>> 
>>>>>>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>>>>>>         # Calculate the correct row in the array for the current process
>>>>>>>>>         row_in_array = counter + local_rows_start
>>>>>>>>>         matrix.setValues(
>>>>>>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>>>>>>         )
>>>>>>>>> 
>>>>>>>>>     # Assembly the matrix to compute the final structure
>>>>>>>>>     matrix.assemblyBegin()
>>>>>>>>>     matrix.assemblyEnd()
>>>>>>>>> 
>>>>>>>>>     return matrix
>>>>>>>>> 
>>>>>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi everyone,
>>>>>>>>>> 
>>>>>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
>>>>>>>>>> 
>>>>>>>>>>     Phi.transposeMatMult(A, A1)
>>>>>>>>>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
>>>>>>>>>> petsc4py.PETSc.Error: error code 56
>>>>>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>>>>>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>>>>>>>>>> [0] No support for this operation for this object type
>>>>>>>>>> [0] Call MatProductCreate() first
>>>>>>>>>> 
>>>>>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
>>>>>>>>>> 
>>>>>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>>>>>> 
>>>>>>>>>> import time
>>>>>>>>>> 
>>>>>>>>>> import numpy as np
>>>>>>>>>> from colorama import Fore
>>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>>>>>> from firedrake.petsc import PETSc
>>>>>>>>>> from mpi4py import MPI
>>>>>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>>>>>> 
>>>>>>>>>> from utilities import (
>>>>>>>>>>     Print,
>>>>>>>>>>     create_petsc_matrix,
>>>>>>>>>> )
>>>>>>>>>> 
>>>>>>>>>> nproc = COMM_WORLD.size
>>>>>>>>>> rank = COMM_WORLD.rank
>>>>>>>>>> 
>>>>>>>>>> # --------------------------------------------
>>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>>>>>> #  A' = Phi.T * A * Phi
>>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>>>>>> # --------------------------------------------
>>>>>>>>>> 
>>>>>>>>>> m, k = 11, 7
>>>>>>>>>> # Generate the random numpy matrices
>>>>>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>>>>>> 
>>>>>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>>>>>> A = create_petsc_matrix(A_np)
>>>>>>>>>> 
>>>>>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>>>>>> Phi = create_petsc_matrix(Phi_np)
>>>>>>>>>> 
>>>>>>>>>> A1 = create_petsc_matrix(np.zeros((k, m)))
>>>>>>>>>> 
>>>>>>>>>> # Now A1 contains the result of Phi^T * A
>>>>>>>>>> Phi.transposeMatMult(A, A1)
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/c80d6864/attachment-0001.html>

From thanasis.boutsikakis at corintis.com  Wed Oct 11 02:04:29 2023
From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis)
Date: Wed, 11 Oct 2023 09:04:29 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <D2E813A6-085A-4CC1-9C97-F2A96219450C@corintis.com>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
	<B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
	<FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>
	<78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com>
	<D2E813A6-085A-4CC1-9C97-F2A96219450C@corintis.com>
Message-ID: <EAC4E5EE-8ED2-417F-8DAF-9C3D7208D0CA@corintis.com>

Furthermore, I tried to perform the Galerkin projection in two steps by substituting

> A_prime = A.ptap(Phi)

With 

AL = Phi.transposeMatMult(A)
A_prime = AL.matMult(Phi)

And running this with 3 procs, results to the false creation of a matrix AL that has 3 times bigger dimensions that it should (A is of size 100x100 and Phi of size 100x7):

MATRIX mpiaij AL [21x300]
Assembled

Partitioning for AL:
  Rank 0: Rows [0, 7)
  Rank 1: Rows [7, 14)
  Rank 2: Rows [14, 21)

And naturally, in another dimension incompatibility:

Traceback (most recent call last):
  File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 85, in <module>
Traceback (most recent call last):
  File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 85, in <module>
    A_prime = AL.matMult(Phi)
    A_prime = AL.matMult(Phi)
              ^^^^^^^^^^^^^^^
  File "petsc4py/PETSc/Mat.pyx", line 1492, in petsc4py.PETSc.Mat.matMult
              ^^^^^^^^^^^^^^^
  File "petsc4py/PETSc/Mat.pyx", line 1492, in petsc4py.PETSc.Mat.matMult
Traceback (most recent call last):
  File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 85, in <module>
petsc4py.PETSc.Error: error code 60
[2] MatMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10053
[2] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9976
[2] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
[2] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:421
[2] Nonconforming object sizes
[2] Matrix dimensions of A and B are incompatible for MatProductType AB: A 21x300, B 100x7
Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 2
petsc4py.PETSc.Error: error code 60
[1] MatMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10053
[1] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9976
[1] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
[1] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:421
[1] Nonconforming object sizes
[1] Matrix dimensions of A and B are incompatible for MatProductType AB: A 21x300, B 100x7
Abort(1) on node 1 (rank 1 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 1
    A_prime = AL.matMult(Phi)
              ^^^^^^^^^^^^^^^
  File "petsc4py/PETSc/Mat.pyx", line 1492, in petsc4py.PETSc.Mat.matMult
petsc4py.PETSc.Error: error code 60
[0] MatMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10053
[0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9976
[0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
[0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:421
[0] Nonconforming object sizes
[0] Matrix dimensions of A and B are incompatible for MatProductType AB: A 21x300, B 100x7
Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0

> On 10 Oct 2023, at 23:33, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
> 
> Hi all,
> 
> Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned).
> 
> """Experimenting with PETSc mat-mat multiplication"""
> 
> import time
> 
> import numpy as np
> from colorama import Fore
> from firedrake import COMM_SELF, COMM_WORLD
> from firedrake.petsc import PETSc
> from mpi4py import MPI
> from numpy.testing import assert_array_almost_equal
> 
> from utilities import Print
> 
> nproc = COMM_WORLD.size
> rank = COMM_WORLD.rank
> 
> def create_petsc_matrix(input_array, sparse=True):
>     """Create a PETSc matrix from an input_array
> 
>     Args:
>         input_array (np array): Input array
>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
> 
>     Returns:
>         PETSc mat: PETSc mpi matrix
>     """
>     # Check if input_array is 1D and reshape if necessary
>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>     global_rows, global_cols = input_array.shape
>     size = ((None, global_rows), (global_cols, global_cols))
> 
>     # Create a sparse or dense matrix based on the 'sparse' argument
>     if sparse:
>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>     else:
>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>     matrix.setUp()
> 
>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
> 
>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>         # Calculate the correct row in the array for the current process
>         row_in_array = counter + local_rows_start
>         matrix.setValues(
>             i, range(global_cols), input_array[row_in_array, :], addv=False
>         )
> 
>     # Assembly the matrix to compute the final structure
>     matrix.assemblyBegin()
>     matrix.assemblyEnd()
> 
>     return matrix
> 
> # --------------------------------------------
> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
> #  A' = Phi.T * A * Phi
> # [k x k] <- [k x m] x [m x m] x [m x k]
> # --------------------------------------------
> 
> m, k = 100, 7
> # Generate the random numpy matrices
> np.random.seed(0)  # sets the seed to 0
> A_np = np.random.randint(low=0, high=6, size=(m, m))
> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
> 
> # --------------------------------------------
> # TEST: Galerking projection of numpy matrices A_np and Phi_np
> # --------------------------------------------
> Aprime_np = Phi_np.T @ A_np @ Phi_np
> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
> Print(f"{Aprime_np}")
> 
> # Create A as an mpi matrix distributed on each process
> A = create_petsc_matrix(A_np, sparse=False)
> 
> # Create Phi as an mpi matrix distributed on each process
> Phi = create_petsc_matrix(Phi_np, sparse=False)
> 
> # Create an empty PETSc matrix object to store the result of the PtAP operation.
> # This will hold the result A' = Phi.T * A * Phi after the computation.
> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
> 
> # Perform the PtAP (Phi Transpose times A times Phi) operation.
> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
> # A_prime will store the result of the operation.
> A_prime = A.ptap(Phi)
> 
> Here is the error
> 
> MATRIX mpiaij A [100x100]
> Assembled
> 
> Partitioning for A:
>   Rank 0: Rows [0, 34)
>   Rank 1: Rows [34, 67)
>   Rank 2: Rows [67, 100)
> 
> MATRIX mpiaij Phi [100x7]
> Assembled
> 
> Partitioning for Phi:
>   Rank 0: Rows [0, 34)
>   Rank 1: Rows [34, 67)
>   Rank 2: Rows [67, 100)
> 
> Traceback (most recent call last):
>   File "/Users/boutsitron/work/galerkin_projection.py", line 87, in <module>
>     A_prime = A.ptap(Phi)
>               ^^^^^^^^^^^
>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
> petsc4py.PETSc.Error: error code 60
> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
> [0] Nonconforming object sizes
> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
> 
> Any thoughts?
> 
> Thanks,
> Thanos
> 
>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>> 
>> This works Pierre. Amazing input, thanks a lot!
>> 
>>> On 5 Oct 2023, at 14:17, Pierre Jolivet <pierre at joliv.et> wrote:
>>> 
>>> Not a petsc4py expert here, but you may to try instead:
>>> A_prime = A.ptap(Phi)
>>> 
>>> Thanks,
>>> Pierre
>>> 
>>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>>> 
>>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth?
>>>> 
>>>> [0]PETSC ERROR: ------------------------------------------------------------------------
>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>>> [0]PETSC ERROR: to get more information on the crash.
>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
>>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>>> 
>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>> 
>>>> import time
>>>> 
>>>> import numpy as np
>>>> from colorama import Fore
>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>> from firedrake.petsc import PETSc
>>>> from mpi4py import MPI
>>>> from numpy.testing import assert_array_almost_equal
>>>> 
>>>> from utilities import (
>>>>     Print,
>>>>     create_petsc_matrix,
>>>>     print_matrix_partitioning,
>>>> )
>>>> 
>>>> nproc = COMM_WORLD.size
>>>> rank = COMM_WORLD.rank
>>>> 
>>>> # --------------------------------------------
>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>> #  A' = Phi.T * A * Phi
>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>> # --------------------------------------------
>>>> 
>>>> m, k = 11, 7
>>>> # Generate the random numpy matrices
>>>> np.random.seed(0)  # sets the seed to 0
>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>> 
>>>> # --------------------------------------------
>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>>> # --------------------------------------------
>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>>> Print(f"{Aprime_np}")
>>>> 
>>>> # Create A as an mpi matrix distributed on each process
>>>> A = create_petsc_matrix(A_np, sparse=False)
>>>> 
>>>> # Create Phi as an mpi matrix distributed on each process
>>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>>> 
>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>>> 
>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>>> # A_prime will store the result of the operation.
>>>> Phi.PtAP(A, A_prime)
>>>> 
>>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et> wrote:
>>>>> 
>>>>> How about using ptap which will use MatPtAP?
>>>>> It will be more efficient (and it will help you bypass the issue).
>>>>> 
>>>>> Thanks,
>>>>> Pierre
>>>>> 
>>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>>>>> 
>>>>>> Sorry, forgot function create_petsc_matrix()
>>>>>> 
>>>>>> def create_petsc_matrix(input_array sparse=True):
>>>>>>     """Create a PETSc matrix from an input_array
>>>>>> 
>>>>>>     Args:
>>>>>>         input_array (np array): Input array
>>>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>>>> 
>>>>>>     Returns:
>>>>>>         PETSc mat: PETSc matrix
>>>>>>     """
>>>>>>     # Check if input_array is 1D and reshape if necessary
>>>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>>>     global_rows, global_cols = input_array.shape
>>>>>> 
>>>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>>>> 
>>>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>>>     if sparse:
>>>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>>>     else:
>>>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>>>     matrix.setUp()
>>>>>> 
>>>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>>>> 
>>>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>>>         # Calculate the correct row in the array for the current process
>>>>>>         row_in_array = counter + local_rows_start
>>>>>>         matrix.setValues(
>>>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>>>         )
>>>>>> 
>>>>>>     # Assembly the matrix to compute the final structure
>>>>>>     matrix.assemblyBegin()
>>>>>>     matrix.assemblyEnd()
>>>>>> 
>>>>>>     return matrix
>>>>>> 
>>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> 
>>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
>>>>>>> 
>>>>>>>     Phi.transposeMatMult(A, A1)
>>>>>>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
>>>>>>> petsc4py.PETSc.Error: error code 56
>>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>>>>>>> [0] No support for this operation for this object type
>>>>>>> [0] Call MatProductCreate() first
>>>>>>> 
>>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
>>>>>>> 
>>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>>> 
>>>>>>> import time
>>>>>>> 
>>>>>>> import numpy as np
>>>>>>> from colorama import Fore
>>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>>> from firedrake.petsc import PETSc
>>>>>>> from mpi4py import MPI
>>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>>> 
>>>>>>> from utilities import (
>>>>>>>     Print,
>>>>>>>     create_petsc_matrix,
>>>>>>> )
>>>>>>> 
>>>>>>> nproc = COMM_WORLD.size
>>>>>>> rank = COMM_WORLD.rank
>>>>>>> 
>>>>>>> # --------------------------------------------
>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>>> #  A' = Phi.T * A * Phi
>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>>> # --------------------------------------------
>>>>>>> 
>>>>>>> m, k = 11, 7
>>>>>>> # Generate the random numpy matrices
>>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>>> 
>>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>>> A = create_petsc_matrix(A_np)
>>>>>>> 
>>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>>> Phi = create_petsc_matrix(Phi_np)
>>>>>>> 
>>>>>>> A1 = create_petsc_matrix(np.zeros((k, m)))
>>>>>>> 
>>>>>>> # Now A1 contains the result of Phi^T * A
>>>>>>> Phi.transposeMatMult(A, A1)
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/ff123062/attachment-0001.html>

From pierre at joliv.et  Wed Oct 11 02:04:51 2023
From: pierre at joliv.et (Pierre Jolivet)
Date: Wed, 11 Oct 2023 09:04:51 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <327E3AAA-1AD0-4051-B977-55420DE24067@corintis.com>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
	<B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
	<FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>
	<78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com>
	<D2E813A6-085A-4CC1-9C97-F2A96219450C@corintis.com>
	<CAMYG4GnL4AGPa6fagC12eJnQWKenLGdz7Hayi=UqaWL3nALrAg@mail.gmail.com>
	<CADOhEh6+DZVUGF01nuBv1xFKHtu3YBmELzeu9O6NUzYQtHBuCg@mail.gmail.com>
	<3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et>
	<327E3AAA-1AD0-4051-B977-55420DE24067@corintis.com>
Message-ID: <DD5C0E1F-30E8-4DCC-A96D-0FE859FB9FB2@joliv.et>

That?s because:
    size = ((None, global_rows), (global_cols, global_cols)) 
should be:
    size = ((None, global_rows), (None, global_cols)) 
Then, it will work.
$ ~/repo/petsc/arch-darwin-c-debug-real/bin/mpirun -n 4 python3.12 test.py && echo $?
0

Thanks,
Pierre

> On 11 Oct 2023, at 8:58?AM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
> 
> Pierre, I see your point, but my experiment shows that it does not even run due to size mismatch, so I don?t see how being sparse would change things here. There must be some kind of problem with the parallel ptap(), because it does run sequentially. In order to test that, I changed the flags of the matrix creation to sparse=True and ran it again. Here is the code
> 
> """Experimenting with PETSc mat-mat multiplication"""
> 
> import numpy as np
> from firedrake import COMM_WORLD
> from firedrake.petsc import PETSc
> 
> from utilities import Print
> 
> nproc = COMM_WORLD.size
> rank = COMM_WORLD.rank
> 
> 
> def create_petsc_matrix(input_array, sparse=True):
>     """Create a PETSc matrix from an input_array
> 
>     Args:
>         input_array (np array): Input array
>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
> 
>     Returns:
>         PETSc mat: PETSc mpi matrix
>     """
>     # Check if input_array is 1D and reshape if necessary
>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>     global_rows, global_cols = input_array.shape
>     size = ((None, global_rows), (global_cols, global_cols))
> 
>     # Create a sparse or dense matrix based on the 'sparse' argument
>     if sparse:
>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>     else:
>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>     matrix.setUp()
> 
>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
> 
>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>         # Calculate the correct row in the array for the current process
>         row_in_array = counter + local_rows_start
>         matrix.setValues(
>             i, range(global_cols), input_array[row_in_array, :], addv=False
>         )
> 
>     # Assembly the matrix to compute the final structure
>     matrix.assemblyBegin()
>     matrix.assemblyEnd()
> 
>     return matrix
> 
> 
> # --------------------------------------------
> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
> #  A' = Phi.T * A * Phi
> # [k x k] <- [k x m] x [m x m] x [m x k]
> # --------------------------------------------
> 
> m, k = 100, 7
> # Generate the random numpy matrices
> np.random.seed(0)  # sets the seed to 0
> A_np = np.random.randint(low=0, high=6, size=(m, m))
> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
> 
> # --------------------------------------------
> # TEST: Galerking projection of numpy matrices A_np and Phi_np
> # --------------------------------------------
> Aprime_np = Phi_np.T @ A_np @ Phi_np
> 
> # Create A as an mpi matrix distributed on each process
> A = create_petsc_matrix(A_np, sparse=True)
> 
> # Create Phi as an mpi matrix distributed on each process
> Phi = create_petsc_matrix(Phi_np, sparse=True)
> 
> # Create an empty PETSc matrix object to store the result of the PtAP operation.
> # This will hold the result A' = Phi.T * A * Phi after the computation.
> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=True)
> 
> # Perform the PtAP (Phi Transpose times A times Phi) operation.
> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
> # A_prime will store the result of the operation.
> A_prime = A.ptap(Phi)
> 
> I got
> 
> Traceback (most recent call last):
>   File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
> Traceback (most recent call last):
>   File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
> Traceback (most recent call last):
>   File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
>     A_prime = A.ptap(Phi)
>     A_prime = A.ptap(Phi)
>               ^^^^^^^^^^^
>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>     A_prime = A.ptap(Phi)
>               ^^^^^^^^^^^
>               ^^^^^^^^^^^
>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
> petsc4py.PETSc.Error: error code 60
> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
> [0] Nonconforming object sizes
> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
> petsc4py.PETSc.Error: error code 60
> [1] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
> [1] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
> [1] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
> [1] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
> [1] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
> [1] Nonconforming object sizes
> [1] Matrix local dimensions are incompatible, Acol (100, 200) != Prow (34,67)
> Abort(1) on node 1 (rank 1 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 1
> petsc4py.PETSc.Error: error code 60
> [2] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
> [2] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
> [2] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
> [2] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
> [2] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
> [2] Nonconforming object sizes
> [2] Matrix local dimensions are incompatible, Acol (200, 300) != Prow (67,100)
> Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 2
> 
>> On 11 Oct 2023, at 07:18, Pierre Jolivet <pierre at joliv.et> wrote:
>> 
>> I disagree with what Mark and Matt are saying: your code is fine, the error message is fine, petsc4py is fine (in this instance).
>> It?s not a typical use case of MatPtAP(), which is mostly designed for MatAIJ, not MatDense.
>> On the one hand, in the MatDense case, indeed there will be a mismatch between the number of columns of A and the number of rows of P, as written in the error message.
>> On the other hand, there is not much to optimize when computing C = P? A P with everything being dense.
>> I would just write this as B = A P and then C = P? B (but then you may face the same issue as initially reported, please let us know then).
>> 
>> Thanks,
>> Pierre
>> 
>>> On 11 Oct 2023, at 2:42?AM, Mark Adams <mfadams at lbl.gov> wrote:
>>> 
>>> This looks like a false positive or there is some subtle bug here that we are not seeing.
>>> Could this be the first time parallel PtAP has been used (and reported) in petsc4py?
>>> 
>>> Mark
>>> 
>>> On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>>> On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>> Hi all,
>>>>> 
>>>>> Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned).
>>>> 
>>>> Are you positive about this? P^T A P is designed to run in this scenario, so either we have a bug or the diagnosis is wrong.
>>>> 
>>>>   Thanks,
>>>> 
>>>>      Matt
>>>>  
>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>> 
>>>>> import time
>>>>> 
>>>>> import numpy as np
>>>>> from colorama import Fore
>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>> from firedrake.petsc import PETSc
>>>>> from mpi4py import MPI
>>>>> from numpy.testing import assert_array_almost_equal
>>>>> 
>>>>> from utilities import Print
>>>>> 
>>>>> nproc = COMM_WORLD.size
>>>>> rank = COMM_WORLD.rank
>>>>> 
>>>>> def create_petsc_matrix(input_array, sparse=True):
>>>>>     """Create a PETSc matrix from an input_array
>>>>> 
>>>>>     Args:
>>>>>         input_array (np array): Input array
>>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>>> 
>>>>>     Returns:
>>>>>         PETSc mat: PETSc mpi matrix
>>>>>     """
>>>>>     # Check if input_array is 1D and reshape if necessary
>>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>>     global_rows, global_cols = input_array.shape
>>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>>> 
>>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>>     if sparse:
>>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>>     else:
>>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>>     matrix.setUp()
>>>>> 
>>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>>> 
>>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>>         # Calculate the correct row in the array for the current process
>>>>>         row_in_array = counter + local_rows_start
>>>>>         matrix.setValues(
>>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>>         )
>>>>> 
>>>>>     # Assembly the matrix to compute the final structure
>>>>>     matrix.assemblyBegin()
>>>>>     matrix.assemblyEnd()
>>>>> 
>>>>>     return matrix
>>>>> 
>>>>> # --------------------------------------------
>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>> #  A' = Phi.T * A * Phi
>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>> # --------------------------------------------
>>>>> 
>>>>> m, k = 100, 7
>>>>> # Generate the random numpy matrices
>>>>> np.random.seed(0)  # sets the seed to 0
>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>> 
>>>>> # --------------------------------------------
>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>>>> # --------------------------------------------
>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>>>> Print(f"{Aprime_np}")
>>>>> 
>>>>> # Create A as an mpi matrix distributed on each process
>>>>> A = create_petsc_matrix(A_np, sparse=False)
>>>>> 
>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>>>> 
>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>>>> 
>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>>>> # A_prime will store the result of the operation.
>>>>> A_prime = A.ptap(Phi)
>>>>> 
>>>>> Here is the error
>>>>> 
>>>>> MATRIX mpiaij A [100x100]
>>>>> Assembled
>>>>> 
>>>>> Partitioning for A:
>>>>>   Rank 0: Rows [0, 34)
>>>>>   Rank 1: Rows [34, 67)
>>>>>   Rank 2: Rows [67, 100)
>>>>> 
>>>>> MATRIX mpiaij Phi [100x7]
>>>>> Assembled
>>>>> 
>>>>> Partitioning for Phi:
>>>>>   Rank 0: Rows [0, 34)
>>>>>   Rank 1: Rows [34, 67)
>>>>>   Rank 2: Rows [67, 100)
>>>>> 
>>>>> Traceback (most recent call last):
>>>>>   File "/Users/boutsitron/work/galerkin_projection.py", line 87, in <module>
>>>>>     A_prime = A.ptap(Phi)
>>>>>               ^^^^^^^^^^^
>>>>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>>>>> petsc4py.PETSc.Error: error code 60
>>>>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>>>>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>>>>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>>>>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>>>>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>>>>> [0] Nonconforming object sizes
>>>>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
>>>>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
>>>>> 
>>>>> Any thoughts?
>>>>> 
>>>>> Thanks,
>>>>> Thanos
>>>>> 
>>>>>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>> 
>>>>>> This works Pierre. Amazing input, thanks a lot!
>>>>>> 
>>>>>>> On 5 Oct 2023, at 14:17, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>>>> 
>>>>>>> Not a petsc4py expert here, but you may to try instead:
>>>>>>> A_prime = A.ptap(Phi)
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Pierre
>>>>>>> 
>>>>>>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>> 
>>>>>>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth?
>>>>>>>> 
>>>>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------
>>>>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
>>>>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>>>>>>> [0]PETSC ERROR: to get more information on the crash.
>>>>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
>>>>>>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>>>>>>> 
>>>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>>>> 
>>>>>>>> import time
>>>>>>>> 
>>>>>>>> import numpy as np
>>>>>>>> from colorama import Fore
>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>>>> from firedrake.petsc import PETSc
>>>>>>>> from mpi4py import MPI
>>>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>>>> 
>>>>>>>> from utilities import (
>>>>>>>>     Print,
>>>>>>>>     create_petsc_matrix,
>>>>>>>>     print_matrix_partitioning,
>>>>>>>> )
>>>>>>>> 
>>>>>>>> nproc = COMM_WORLD.size
>>>>>>>> rank = COMM_WORLD.rank
>>>>>>>> 
>>>>>>>> # --------------------------------------------
>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>>>> #  A' = Phi.T * A * Phi
>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>>>> # --------------------------------------------
>>>>>>>> 
>>>>>>>> m, k = 11, 7
>>>>>>>> # Generate the random numpy matrices
>>>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>>>> 
>>>>>>>> # --------------------------------------------
>>>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>>>>>>> # --------------------------------------------
>>>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>>>>>>> Print(f"{Aprime_np}")
>>>>>>>> 
>>>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>>>> A = create_petsc_matrix(A_np, sparse=False)
>>>>>>>> 
>>>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>>>>>>> 
>>>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>>>>>>> 
>>>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>>>>>>> # A_prime will store the result of the operation.
>>>>>>>> Phi.PtAP(A, A_prime)
>>>>>>>> 
>>>>>>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>>>>>> 
>>>>>>>>> How about using ptap which will use MatPtAP?
>>>>>>>>> It will be more efficient (and it will help you bypass the issue).
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Pierre
>>>>>>>>> 
>>>>>>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Sorry, forgot function create_petsc_matrix()
>>>>>>>>>> 
>>>>>>>>>> def create_petsc_matrix(input_array sparse=True):
>>>>>>>>>>     """Create a PETSc matrix from an input_array
>>>>>>>>>> 
>>>>>>>>>>     Args:
>>>>>>>>>>         input_array (np array): Input array
>>>>>>>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>>>>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>>>>>>>> 
>>>>>>>>>>     Returns:
>>>>>>>>>>         PETSc mat: PETSc matrix
>>>>>>>>>>     """
>>>>>>>>>>     # Check if input_array is 1D and reshape if necessary
>>>>>>>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>>>>>>>     global_rows, global_cols = input_array.shape
>>>>>>>>>> 
>>>>>>>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>>>>>>>> 
>>>>>>>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>>>>>>>     if sparse:
>>>>>>>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>>>>>>>     else:
>>>>>>>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>>>>>>>     matrix.setUp()
>>>>>>>>>> 
>>>>>>>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>>>>>>>> 
>>>>>>>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>>>>>>>         # Calculate the correct row in the array for the current process
>>>>>>>>>>         row_in_array = counter + local_rows_start
>>>>>>>>>>         matrix.setValues(
>>>>>>>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>>>>>>>         )
>>>>>>>>>> 
>>>>>>>>>>     # Assembly the matrix to compute the final structure
>>>>>>>>>>     matrix.assemblyBegin()
>>>>>>>>>>     matrix.assemblyEnd()
>>>>>>>>>> 
>>>>>>>>>>     return matrix
>>>>>>>>>> 
>>>>>>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>> 
>>>>>>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
>>>>>>>>>>> 
>>>>>>>>>>>     Phi.transposeMatMult(A, A1)
>>>>>>>>>>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
>>>>>>>>>>> petsc4py.PETSc.Error: error code 56
>>>>>>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>>>>>>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>>>>>>>>>>> [0] No support for this operation for this object type
>>>>>>>>>>> [0] Call MatProductCreate() first
>>>>>>>>>>> 
>>>>>>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
>>>>>>>>>>> 
>>>>>>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>>>>>>> 
>>>>>>>>>>> import time
>>>>>>>>>>> 
>>>>>>>>>>> import numpy as np
>>>>>>>>>>> from colorama import Fore
>>>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>>>>>>> from firedrake.petsc import PETSc
>>>>>>>>>>> from mpi4py import MPI
>>>>>>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>>>>>>> 
>>>>>>>>>>> from utilities import (
>>>>>>>>>>>     Print,
>>>>>>>>>>>     create_petsc_matrix,
>>>>>>>>>>> )
>>>>>>>>>>> 
>>>>>>>>>>> nproc = COMM_WORLD.size
>>>>>>>>>>> rank = COMM_WORLD.rank
>>>>>>>>>>> 
>>>>>>>>>>> # --------------------------------------------
>>>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>>>>>>> #  A' = Phi.T * A * Phi
>>>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>>>>>>> # --------------------------------------------
>>>>>>>>>>> 
>>>>>>>>>>> m, k = 11, 7
>>>>>>>>>>> # Generate the random numpy matrices
>>>>>>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>>>>>>> 
>>>>>>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>>>>>>> A = create_petsc_matrix(A_np)
>>>>>>>>>>> 
>>>>>>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>>>>>>> Phi = create_petsc_matrix(Phi_np)
>>>>>>>>>>> 
>>>>>>>>>>> A1 = create_petsc_matrix(np.zeros((k, m)))
>>>>>>>>>>> 
>>>>>>>>>>> # Now A1 contains the result of Phi^T * A
>>>>>>>>>>> Phi.transposeMatMult(A, A1)
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>> -- Norbert Wiener
>>>> 
>>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/17150218/attachment-0001.html>

From thanasis.boutsikakis at corintis.com  Wed Oct 11 02:13:28 2023
From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis)
Date: Wed, 11 Oct 2023 09:13:28 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <DD5C0E1F-30E8-4DCC-A96D-0FE859FB9FB2@joliv.et>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
	<B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
	<FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>
	<78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com>
	<D2E813A6-085A-4CC1-9C97-F2A96219450C@corintis.com>
	<CAMYG4GnL4AGPa6fagC12eJnQWKenLGdz7Hayi=UqaWL3nALrAg@mail.gmail.com>
	<CADOhEh6+DZVUGF01nuBv1xFKHtu3YBmELzeu9O6NUzYQtHBuCg@mail.gmail.com>
	<3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et>
	<327E3AAA-1AD0-4051-B977-55420DE24067@corintis.com>
	<DD5C0E1F-30E8-4DCC-A96D-0FE859FB9FB2@joliv.et>
Message-ID: <74C597F2-65FA-4CCF-9611-C1C196E4C4C0@corintis.com>

Very good catch Pierre, thanks a lot!

This made everything work: the two-step process and the ptap(). I mistakenly thought that I should not let the local number of columns to be None, since the matrix is only partitioned row-wise. Could you please explain what happened because of my setting the local column number so that I get the philosophy behind this partitioning?

Thanks again,
Thanos

> On 11 Oct 2023, at 09:04, Pierre Jolivet <pierre at joliv.et> wrote:
> 
> That?s because:
>     size = ((None, global_rows), (global_cols, global_cols)) 
> should be:
>     size = ((None, global_rows), (None, global_cols)) 
> Then, it will work.
> $ ~/repo/petsc/arch-darwin-c-debug-real/bin/mpirun -n 4 python3.12 test.py && echo $?
> 0
> 
> Thanks,
> Pierre
> 
>> On 11 Oct 2023, at 8:58?AM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>> 
>> Pierre, I see your point, but my experiment shows that it does not even run due to size mismatch, so I don?t see how being sparse would change things here. There must be some kind of problem with the parallel ptap(), because it does run sequentially. In order to test that, I changed the flags of the matrix creation to sparse=True and ran it again. Here is the code
>> 
>> """Experimenting with PETSc mat-mat multiplication"""
>> 
>> import numpy as np
>> from firedrake import COMM_WORLD
>> from firedrake.petsc import PETSc
>> 
>> from utilities import Print
>> 
>> nproc = COMM_WORLD.size
>> rank = COMM_WORLD.rank
>> 
>> 
>> def create_petsc_matrix(input_array, sparse=True):
>>     """Create a PETSc matrix from an input_array
>> 
>>     Args:
>>         input_array (np array): Input array
>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>> 
>>     Returns:
>>         PETSc mat: PETSc mpi matrix
>>     """
>>     # Check if input_array is 1D and reshape if necessary
>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>     global_rows, global_cols = input_array.shape
>>     size = ((None, global_rows), (global_cols, global_cols))
>> 
>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>     if sparse:
>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>     else:
>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>     matrix.setUp()
>> 
>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>> 
>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>         # Calculate the correct row in the array for the current process
>>         row_in_array = counter + local_rows_start
>>         matrix.setValues(
>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>         )
>> 
>>     # Assembly the matrix to compute the final structure
>>     matrix.assemblyBegin()
>>     matrix.assemblyEnd()
>> 
>>     return matrix
>> 
>> 
>> # --------------------------------------------
>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>> #  A' = Phi.T * A * Phi
>> # [k x k] <- [k x m] x [m x m] x [m x k]
>> # --------------------------------------------
>> 
>> m, k = 100, 7
>> # Generate the random numpy matrices
>> np.random.seed(0)  # sets the seed to 0
>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>> 
>> # --------------------------------------------
>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>> # --------------------------------------------
>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>> 
>> # Create A as an mpi matrix distributed on each process
>> A = create_petsc_matrix(A_np, sparse=True)
>> 
>> # Create Phi as an mpi matrix distributed on each process
>> Phi = create_petsc_matrix(Phi_np, sparse=True)
>> 
>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=True)
>> 
>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>> # A_prime will store the result of the operation.
>> A_prime = A.ptap(Phi)
>> 
>> I got
>> 
>> Traceback (most recent call last):
>>   File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
>> Traceback (most recent call last):
>>   File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
>> Traceback (most recent call last):
>>   File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
>>     A_prime = A.ptap(Phi)
>>     A_prime = A.ptap(Phi)
>>               ^^^^^^^^^^^
>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>>     A_prime = A.ptap(Phi)
>>               ^^^^^^^^^^^
>>               ^^^^^^^^^^^
>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>> petsc4py.PETSc.Error: error code 60
>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>> [0] Nonconforming object sizes
>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
>> petsc4py.PETSc.Error: error code 60
>> [1] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>> [1] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>> [1] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>> [1] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>> [1] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>> [1] Nonconforming object sizes
>> [1] Matrix local dimensions are incompatible, Acol (100, 200) != Prow (34,67)
>> Abort(1) on node 1 (rank 1 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 1
>> petsc4py.PETSc.Error: error code 60
>> [2] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>> [2] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>> [2] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>> [2] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>> [2] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>> [2] Nonconforming object sizes
>> [2] Matrix local dimensions are incompatible, Acol (200, 300) != Prow (67,100)
>> Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 2
>> 
>>> On 11 Oct 2023, at 07:18, Pierre Jolivet <pierre at joliv.et> wrote:
>>> 
>>> I disagree with what Mark and Matt are saying: your code is fine, the error message is fine, petsc4py is fine (in this instance).
>>> It?s not a typical use case of MatPtAP(), which is mostly designed for MatAIJ, not MatDense.
>>> On the one hand, in the MatDense case, indeed there will be a mismatch between the number of columns of A and the number of rows of P, as written in the error message.
>>> On the other hand, there is not much to optimize when computing C = P? A P with everything being dense.
>>> I would just write this as B = A P and then C = P? B (but then you may face the same issue as initially reported, please let us know then).
>>> 
>>> Thanks,
>>> Pierre
>>> 
>>>> On 11 Oct 2023, at 2:42?AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>> 
>>>> This looks like a false positive or there is some subtle bug here that we are not seeing.
>>>> Could this be the first time parallel PtAP has been used (and reported) in petsc4py?
>>>> 
>>>> Mark
>>>> 
>>>> On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>>>> On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>> Hi all,
>>>>>> 
>>>>>> Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned).
>>>>> 
>>>>> Are you positive about this? P^T A P is designed to run in this scenario, so either we have a bug or the diagnosis is wrong.
>>>>> 
>>>>>   Thanks,
>>>>> 
>>>>>      Matt
>>>>>  
>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>> 
>>>>>> import time
>>>>>> 
>>>>>> import numpy as np
>>>>>> from colorama import Fore
>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>> from firedrake.petsc import PETSc
>>>>>> from mpi4py import MPI
>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>> 
>>>>>> from utilities import Print
>>>>>> 
>>>>>> nproc = COMM_WORLD.size
>>>>>> rank = COMM_WORLD.rank
>>>>>> 
>>>>>> def create_petsc_matrix(input_array, sparse=True):
>>>>>>     """Create a PETSc matrix from an input_array
>>>>>> 
>>>>>>     Args:
>>>>>>         input_array (np array): Input array
>>>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>>>> 
>>>>>>     Returns:
>>>>>>         PETSc mat: PETSc mpi matrix
>>>>>>     """
>>>>>>     # Check if input_array is 1D and reshape if necessary
>>>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>>>     global_rows, global_cols = input_array.shape
>>>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>>>> 
>>>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>>>     if sparse:
>>>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>>>     else:
>>>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>>>     matrix.setUp()
>>>>>> 
>>>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>>>> 
>>>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>>>         # Calculate the correct row in the array for the current process
>>>>>>         row_in_array = counter + local_rows_start
>>>>>>         matrix.setValues(
>>>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>>>         )
>>>>>> 
>>>>>>     # Assembly the matrix to compute the final structure
>>>>>>     matrix.assemblyBegin()
>>>>>>     matrix.assemblyEnd()
>>>>>> 
>>>>>>     return matrix
>>>>>> 
>>>>>> # --------------------------------------------
>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>> #  A' = Phi.T * A * Phi
>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>> # --------------------------------------------
>>>>>> 
>>>>>> m, k = 100, 7
>>>>>> # Generate the random numpy matrices
>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>> 
>>>>>> # --------------------------------------------
>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>>>>> # --------------------------------------------
>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>>>>> Print(f"{Aprime_np}")
>>>>>> 
>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>> A = create_petsc_matrix(A_np, sparse=False)
>>>>>> 
>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>>>>> 
>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>>>>> 
>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>>>>> # A_prime will store the result of the operation.
>>>>>> A_prime = A.ptap(Phi)
>>>>>> 
>>>>>> Here is the error
>>>>>> 
>>>>>> MATRIX mpiaij A [100x100]
>>>>>> Assembled
>>>>>> 
>>>>>> Partitioning for A:
>>>>>>   Rank 0: Rows [0, 34)
>>>>>>   Rank 1: Rows [34, 67)
>>>>>>   Rank 2: Rows [67, 100)
>>>>>> 
>>>>>> MATRIX mpiaij Phi [100x7]
>>>>>> Assembled
>>>>>> 
>>>>>> Partitioning for Phi:
>>>>>>   Rank 0: Rows [0, 34)
>>>>>>   Rank 1: Rows [34, 67)
>>>>>>   Rank 2: Rows [67, 100)
>>>>>> 
>>>>>> Traceback (most recent call last):
>>>>>>   File "/Users/boutsitron/work/galerkin_projection.py", line 87, in <module>
>>>>>>     A_prime = A.ptap(Phi)
>>>>>>               ^^^^^^^^^^^
>>>>>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>>>>>> petsc4py.PETSc.Error: error code 60
>>>>>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>>>>>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>>>>>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>>>>>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>>>>>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>>>>>> [0] Nonconforming object sizes
>>>>>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
>>>>>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
>>>>>> 
>>>>>> Any thoughts?
>>>>>> 
>>>>>> Thanks,
>>>>>> Thanos
>>>>>> 
>>>>>>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>> 
>>>>>>> This works Pierre. Amazing input, thanks a lot!
>>>>>>> 
>>>>>>>> On 5 Oct 2023, at 14:17, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>>>>> 
>>>>>>>> Not a petsc4py expert here, but you may to try instead:
>>>>>>>> A_prime = A.ptap(Phi)
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Pierre
>>>>>>>> 
>>>>>>>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth?
>>>>>>>>> 
>>>>>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------
>>>>>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
>>>>>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>>>>>>>> [0]PETSC ERROR: to get more information on the crash.
>>>>>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
>>>>>>>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>>>>>>>> 
>>>>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>>>>> 
>>>>>>>>> import time
>>>>>>>>> 
>>>>>>>>> import numpy as np
>>>>>>>>> from colorama import Fore
>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>>>>> from firedrake.petsc import PETSc
>>>>>>>>> from mpi4py import MPI
>>>>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>>>>> 
>>>>>>>>> from utilities import (
>>>>>>>>>     Print,
>>>>>>>>>     create_petsc_matrix,
>>>>>>>>>     print_matrix_partitioning,
>>>>>>>>> )
>>>>>>>>> 
>>>>>>>>> nproc = COMM_WORLD.size
>>>>>>>>> rank = COMM_WORLD.rank
>>>>>>>>> 
>>>>>>>>> # --------------------------------------------
>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>>>>> #  A' = Phi.T * A * Phi
>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>>>>> # --------------------------------------------
>>>>>>>>> 
>>>>>>>>> m, k = 11, 7
>>>>>>>>> # Generate the random numpy matrices
>>>>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>>>>> 
>>>>>>>>> # --------------------------------------------
>>>>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>>>>>>>> # --------------------------------------------
>>>>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>>>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>>>>>>>> Print(f"{Aprime_np}")
>>>>>>>>> 
>>>>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>>>>> A = create_petsc_matrix(A_np, sparse=False)
>>>>>>>>> 
>>>>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>>>>>>>> 
>>>>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>>>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>>>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>>>>>>>> 
>>>>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>>>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>>>>>>>> # A_prime will store the result of the operation.
>>>>>>>>> Phi.PtAP(A, A_prime)
>>>>>>>>> 
>>>>>>>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>>>>>>> 
>>>>>>>>>> How about using ptap which will use MatPtAP?
>>>>>>>>>> It will be more efficient (and it will help you bypass the issue).
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Pierre
>>>>>>>>>> 
>>>>>>>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Sorry, forgot function create_petsc_matrix()
>>>>>>>>>>> 
>>>>>>>>>>> def create_petsc_matrix(input_array sparse=True):
>>>>>>>>>>>     """Create a PETSc matrix from an input_array
>>>>>>>>>>> 
>>>>>>>>>>>     Args:
>>>>>>>>>>>         input_array (np array): Input array
>>>>>>>>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>>>>>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>>>>>>>>> 
>>>>>>>>>>>     Returns:
>>>>>>>>>>>         PETSc mat: PETSc matrix
>>>>>>>>>>>     """
>>>>>>>>>>>     # Check if input_array is 1D and reshape if necessary
>>>>>>>>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>>>>>>>>     global_rows, global_cols = input_array.shape
>>>>>>>>>>> 
>>>>>>>>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>>>>>>>>> 
>>>>>>>>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>>>>>>>>     if sparse:
>>>>>>>>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>>>>>>>>     else:
>>>>>>>>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>>>>>>>>     matrix.setUp()
>>>>>>>>>>> 
>>>>>>>>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>>>>>>>>> 
>>>>>>>>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>>>>>>>>         # Calculate the correct row in the array for the current process
>>>>>>>>>>>         row_in_array = counter + local_rows_start
>>>>>>>>>>>         matrix.setValues(
>>>>>>>>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>>>>>>>>         )
>>>>>>>>>>> 
>>>>>>>>>>>     # Assembly the matrix to compute the final structure
>>>>>>>>>>>     matrix.assemblyBegin()
>>>>>>>>>>>     matrix.assemblyEnd()
>>>>>>>>>>> 
>>>>>>>>>>>     return matrix
>>>>>>>>>>> 
>>>>>>>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>> 
>>>>>>>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
>>>>>>>>>>>> 
>>>>>>>>>>>>     Phi.transposeMatMult(A, A1)
>>>>>>>>>>>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
>>>>>>>>>>>> petsc4py.PETSc.Error: error code 56
>>>>>>>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>>>>>>>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>>>>>>>>>>>> [0] No support for this operation for this object type
>>>>>>>>>>>> [0] Call MatProductCreate() first
>>>>>>>>>>>> 
>>>>>>>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
>>>>>>>>>>>> 
>>>>>>>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>>>>>>>> 
>>>>>>>>>>>> import time
>>>>>>>>>>>> 
>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>> from colorama import Fore
>>>>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>>>>>>>> from firedrake.petsc import PETSc
>>>>>>>>>>>> from mpi4py import MPI
>>>>>>>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>>>>>>>> 
>>>>>>>>>>>> from utilities import (
>>>>>>>>>>>>     Print,
>>>>>>>>>>>>     create_petsc_matrix,
>>>>>>>>>>>> )
>>>>>>>>>>>> 
>>>>>>>>>>>> nproc = COMM_WORLD.size
>>>>>>>>>>>> rank = COMM_WORLD.rank
>>>>>>>>>>>> 
>>>>>>>>>>>> # --------------------------------------------
>>>>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>>>>>>>> #  A' = Phi.T * A * Phi
>>>>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>>>>>>>> # --------------------------------------------
>>>>>>>>>>>> 
>>>>>>>>>>>> m, k = 11, 7
>>>>>>>>>>>> # Generate the random numpy matrices
>>>>>>>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>>>>>>>> 
>>>>>>>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>>>>>>>> A = create_petsc_matrix(A_np)
>>>>>>>>>>>> 
>>>>>>>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>>>>>>>> Phi = create_petsc_matrix(Phi_np)
>>>>>>>>>>>> 
>>>>>>>>>>>> A1 = create_petsc_matrix(np.zeros((k, m)))
>>>>>>>>>>>> 
>>>>>>>>>>>> # Now A1 contains the result of Phi^T * A
>>>>>>>>>>>> Phi.transposeMatMult(A, A1)
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>> -- Norbert Wiener
>>>>> 
>>>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/b85869ef/attachment-0001.html>

From Roland.Richter at empa.ch  Wed Oct 11 03:21:55 2023
From: Roland.Richter at empa.ch (Richter, Roland)
Date: Wed, 11 Oct 2023 08:21:55 +0000
Subject: [petsc-users] Compilation failure of PETSc with "The procedure name
 of the INTERFACE block conflicts with a name in the encompassing scoping
 unit"
Message-ID: <f6983a26830140a4a1183d8690bceecb@empa.ch>

Hei,

following my last question I managed to configure PETSc with Intel MPI and
Intel OneAPI using the following configure-line:

 
./configure --prefix=/media/storage/local_opt/petsc
--with-scalar-type=complex --with-cc=mpiicc --with-cxx=mpiicpc
--CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC -march=native
-mavx2" --with-fc=mpiifort --with-pic=true --with-mpi=true
--with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/
--with-openmp=true --download-hdf5=yes --download-netcdf=yes
--download-chaco=no --download-metis=yes --download-slepc=yes
--download-suitesparse=yes --download-eigen=yes --download-parmetis=yes
--download-ptscotch=yes --download-mumps=yes --download-scalapack=yes
--download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1
--with-boost=1 --with-boost-dir=/media/storage/local_opt/boost
--download-opencascade=yes --with-fftw=1
--with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes
--with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1
--download-muparser=yes --download-p4est=yes --download-sowing=yes
--download-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=0

 
Now, however, compilation fails with the following error:

/home/user/Downloads/git-files/petsc/include/../src/ksp/f90-mod/ftn-auto-int
erfaces/petscpc.h90(699): error #6623: The procedure name of the INTERFACE
block conflicts with a name in the encompassing scoping unit.
[PCGASMCREATESUBDOMAINS2D]

      subroutine PCGASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,j,z)

-----------------^

/home/user/Downloads/git-files/petsc/include/../src/ksp/f90-mod/ftn-auto-int
erfaces/petscpc.h90(1199): error #6623: The procedure name of the INTERFACE
block conflicts with a name in the encompassing scoping unit.
[PCASMCREATESUBDOMAINS2D]

      subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z)

-----------------^

I'm on the latest version of origin/main, but can't figure out how to fix
that issue by myself. Therefore, I'd appreciate additional insight. 

Thanks!

Regards,

Roland Richter

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/f9a6e0e0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compilation_log.log
Type: application/octet-stream
Size: 18866 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/f9a6e0e0/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 7926 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/f9a6e0e0/attachment-0001.p7s>

From pierre at joliv.et  Wed Oct 11 03:29:26 2023
From: pierre at joliv.et (Pierre Jolivet)
Date: Wed, 11 Oct 2023 10:29:26 +0200
Subject: [petsc-users] Galerkin projection using petsc4py
In-Reply-To: <74C597F2-65FA-4CCF-9611-C1C196E4C4C0@corintis.com>
References: <6B372CC8-49CC-481D-9236-2FD76F1A3582@corintis.com>
	<27B7EB95-6621-405A-80F0-1C7F03446152@corintis.com>
	<E4F6204B-AB3E-4C8E-A563-CF7489CC0B15@joliv.et>
	<B9099785-6E6D-4315-8243-75F680DBE0D4@corintis.com>
	<FF6BEE58-BE5E-49EB-B274-20415B7A07CA@joliv.et>
	<78AE370F-4E30-41A3-809B-ECFB03634769@corintis.com>
	<D2E813A6-085A-4CC1-9C97-F2A96219450C@corintis.com>
	<CAMYG4GnL4AGPa6fagC12eJnQWKenLGdz7Hayi=UqaWL3nALrAg@mail.gmail.com>
	<CADOhEh6+DZVUGF01nuBv1xFKHtu3YBmELzeu9O6NUzYQtHBuCg@mail.gmail.com>
	<3C8FA7CA-63CB-49F2-8756-535D7FC657C3@joliv.et>
	<327E3AAA-1AD0-4051-B977-55420DE24067@corintis.com>
	<DD5C0E1F-30E8-4DCC-A96D-0FE859FB9FB2@joliv.et>
	<74C597F2-65FA-4CCF-9611-C1C196E4C4C0@corintis.com>
Message-ID: <80B91AD7-7FC5-46FF-9FE0-B3205719C6CE@joliv.et>


> On 11 Oct 2023, at 9:13?AM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
> 
> Very good catch Pierre, thanks a lot!
> 
> This made everything work: the two-step process and the ptap(). I mistakenly thought that I should not let the local number of columns to be None, since the matrix is only partitioned row-wise. Could you please explain what happened because of my setting the local column number so that I get the philosophy behind this partitioning?

Hopefully this should make things clearer to you: https://petsc.org/release/manual/mat/#sec-matlayout

Thanks,
Pierre

> Thanks again,
> Thanos
> 
>> On 11 Oct 2023, at 09:04, Pierre Jolivet <pierre at joliv.et> wrote:
>> 
>> That?s because:
>>     size = ((None, global_rows), (global_cols, global_cols)) 
>> should be:
>>     size = ((None, global_rows), (None, global_cols)) 
>> Then, it will work.
>> $ ~/repo/petsc/arch-darwin-c-debug-real/bin/mpirun -n 4 python3.12 test.py && echo $?
>> 0
>> 
>> Thanks,
>> Pierre
>> 
>>> On 11 Oct 2023, at 8:58?AM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com> wrote:
>>> 
>>> Pierre, I see your point, but my experiment shows that it does not even run due to size mismatch, so I don?t see how being sparse would change things here. There must be some kind of problem with the parallel ptap(), because it does run sequentially. In order to test that, I changed the flags of the matrix creation to sparse=True and ran it again. Here is the code
>>> 
>>> """Experimenting with PETSc mat-mat multiplication"""
>>> 
>>> import numpy as np
>>> from firedrake import COMM_WORLD
>>> from firedrake.petsc import PETSc
>>> 
>>> from utilities import Print
>>> 
>>> nproc = COMM_WORLD.size
>>> rank = COMM_WORLD.rank
>>> 
>>> 
>>> def create_petsc_matrix(input_array, sparse=True):
>>>     """Create a PETSc matrix from an input_array
>>> 
>>>     Args:
>>>         input_array (np array): Input array
>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>> 
>>>     Returns:
>>>         PETSc mat: PETSc mpi matrix
>>>     """
>>>     # Check if input_array is 1D and reshape if necessary
>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>     global_rows, global_cols = input_array.shape
>>>     size = ((None, global_rows), (global_cols, global_cols))
>>> 
>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>     if sparse:
>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>     else:
>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>     matrix.setUp()
>>> 
>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>> 
>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>         # Calculate the correct row in the array for the current process
>>>         row_in_array = counter + local_rows_start
>>>         matrix.setValues(
>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>         )
>>> 
>>>     # Assembly the matrix to compute the final structure
>>>     matrix.assemblyBegin()
>>>     matrix.assemblyEnd()
>>> 
>>>     return matrix
>>> 
>>> 
>>> # --------------------------------------------
>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>> #  A' = Phi.T * A * Phi
>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>> # --------------------------------------------
>>> 
>>> m, k = 100, 7
>>> # Generate the random numpy matrices
>>> np.random.seed(0)  # sets the seed to 0
>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>> 
>>> # --------------------------------------------
>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>> # --------------------------------------------
>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>> 
>>> # Create A as an mpi matrix distributed on each process
>>> A = create_petsc_matrix(A_np, sparse=True)
>>> 
>>> # Create Phi as an mpi matrix distributed on each process
>>> Phi = create_petsc_matrix(Phi_np, sparse=True)
>>> 
>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=True)
>>> 
>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>> # A_prime will store the result of the operation.
>>> A_prime = A.ptap(Phi)
>>> 
>>> I got
>>> 
>>> Traceback (most recent call last):
>>>   File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
>>> Traceback (most recent call last):
>>>   File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
>>> Traceback (most recent call last):
>>>   File "/Users/boutsitron/petsc-experiments/mat_vec_multiplication2.py", line 89, in <module>
>>>     A_prime = A.ptap(Phi)
>>>     A_prime = A.ptap(Phi)
>>>               ^^^^^^^^^^^
>>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>>>     A_prime = A.ptap(Phi)
>>>               ^^^^^^^^^^^
>>>               ^^^^^^^^^^^
>>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>>> petsc4py.PETSc.Error: error code 60
>>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>>> [0] Nonconforming object sizes
>>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
>>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
>>> petsc4py.PETSc.Error: error code 60
>>> [1] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>>> [1] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>>> [1] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>>> [1] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>>> [1] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>>> [1] Nonconforming object sizes
>>> [1] Matrix local dimensions are incompatible, Acol (100, 200) != Prow (34,67)
>>> Abort(1) on node 1 (rank 1 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 1
>>> petsc4py.PETSc.Error: error code 60
>>> [2] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>>> [2] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>>> [2] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>>> [2] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>>> [2] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>>> [2] Nonconforming object sizes
>>> [2] Matrix local dimensions are incompatible, Acol (200, 300) != Prow (67,100)
>>> Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 2
>>> 
>>>> On 11 Oct 2023, at 07:18, Pierre Jolivet <pierre at joliv.et> wrote:
>>>> 
>>>> I disagree with what Mark and Matt are saying: your code is fine, the error message is fine, petsc4py is fine (in this instance).
>>>> It?s not a typical use case of MatPtAP(), which is mostly designed for MatAIJ, not MatDense.
>>>> On the one hand, in the MatDense case, indeed there will be a mismatch between the number of columns of A and the number of rows of P, as written in the error message.
>>>> On the other hand, there is not much to optimize when computing C = P? A P with everything being dense.
>>>> I would just write this as B = A P and then C = P? B (but then you may face the same issue as initially reported, please let us know then).
>>>> 
>>>> Thanks,
>>>> Pierre
>>>> 
>>>>> On 11 Oct 2023, at 2:42?AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>> 
>>>>> This looks like a false positive or there is some subtle bug here that we are not seeing.
>>>>> Could this be the first time parallel PtAP has been used (and reported) in petsc4py?
>>>>> 
>>>>> Mark
>>>>> 
>>>>> On Tue, Oct 10, 2023 at 8:27?PM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>>>>> On Tue, Oct 10, 2023 at 5:34?PM Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> Revisiting my code and the proposed solution from Pierre, I realized this works only in sequential. The reason is that PETSc partitions those matrices only row-wise, which leads to an error due to the mismatch between number of columns of A (non-partitioned) and the number of rows of Phi (partitioned).
>>>>>> 
>>>>>> Are you positive about this? P^T A P is designed to run in this scenario, so either we have a bug or the diagnosis is wrong.
>>>>>> 
>>>>>>   Thanks,
>>>>>> 
>>>>>>      Matt
>>>>>>  
>>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>>> 
>>>>>>> import time
>>>>>>> 
>>>>>>> import numpy as np
>>>>>>> from colorama import Fore
>>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>>> from firedrake.petsc import PETSc
>>>>>>> from mpi4py import MPI
>>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>>> 
>>>>>>> from utilities import Print
>>>>>>> 
>>>>>>> nproc = COMM_WORLD.size
>>>>>>> rank = COMM_WORLD.rank
>>>>>>> 
>>>>>>> def create_petsc_matrix(input_array, sparse=True):
>>>>>>>     """Create a PETSc matrix from an input_array
>>>>>>> 
>>>>>>>     Args:
>>>>>>>         input_array (np array): Input array
>>>>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>>>>> 
>>>>>>>     Returns:
>>>>>>>         PETSc mat: PETSc mpi matrix
>>>>>>>     """
>>>>>>>     # Check if input_array is 1D and reshape if necessary
>>>>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>>>>     global_rows, global_cols = input_array.shape
>>>>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>>>>> 
>>>>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>>>>     if sparse:
>>>>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>>>>     else:
>>>>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>>>>     matrix.setUp()
>>>>>>> 
>>>>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>>>>> 
>>>>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>>>>         # Calculate the correct row in the array for the current process
>>>>>>>         row_in_array = counter + local_rows_start
>>>>>>>         matrix.setValues(
>>>>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>>>>         )
>>>>>>> 
>>>>>>>     # Assembly the matrix to compute the final structure
>>>>>>>     matrix.assemblyBegin()
>>>>>>>     matrix.assemblyEnd()
>>>>>>> 
>>>>>>>     return matrix
>>>>>>> 
>>>>>>> # --------------------------------------------
>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>>> #  A' = Phi.T * A * Phi
>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>>> # --------------------------------------------
>>>>>>> 
>>>>>>> m, k = 100, 7
>>>>>>> # Generate the random numpy matrices
>>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>>> 
>>>>>>> # --------------------------------------------
>>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>>>>>> # --------------------------------------------
>>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>>>>>> Print(f"{Aprime_np}")
>>>>>>> 
>>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>>> A = create_petsc_matrix(A_np, sparse=False)
>>>>>>> 
>>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>>>>>> 
>>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>>>>>> 
>>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>>>>>> # A_prime will store the result of the operation.
>>>>>>> A_prime = A.ptap(Phi)
>>>>>>> 
>>>>>>> Here is the error
>>>>>>> 
>>>>>>> MATRIX mpiaij A [100x100]
>>>>>>> Assembled
>>>>>>> 
>>>>>>> Partitioning for A:
>>>>>>>   Rank 0: Rows [0, 34)
>>>>>>>   Rank 1: Rows [34, 67)
>>>>>>>   Rank 2: Rows [67, 100)
>>>>>>> 
>>>>>>> MATRIX mpiaij Phi [100x7]
>>>>>>> Assembled
>>>>>>> 
>>>>>>> Partitioning for Phi:
>>>>>>>   Rank 0: Rows [0, 34)
>>>>>>>   Rank 1: Rows [34, 67)
>>>>>>>   Rank 2: Rows [67, 100)
>>>>>>> 
>>>>>>> Traceback (most recent call last):
>>>>>>>   File "/Users/boutsitron/work/galerkin_projection.py", line 87, in <module>
>>>>>>>     A_prime = A.ptap(Phi)
>>>>>>>               ^^^^^^^^^^^
>>>>>>>   File "petsc4py/PETSc/Mat.pyx", line 1525, in petsc4py.PETSc.Mat.ptap
>>>>>>> petsc4py.PETSc.Error: error code 60
>>>>>>> [0] MatPtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9896
>>>>>>> [0] MatProductSetFromOptions() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:541
>>>>>>> [0] MatProductSetFromOptions_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matproduct.c:435
>>>>>>> [0] MatProductSetFromOptions_MPIAIJ() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2372
>>>>>>> [0] MatProductSetFromOptions_MPIAIJ_PtAP() at /Users/boutsitron/firedrake/src/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2266
>>>>>>> [0] Nonconforming object sizes
>>>>>>> [0] Matrix local dimensions are incompatible, Acol (0, 100) != Prow (0,34)
>>>>>>> Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 0
>>>>>>> 
>>>>>>> Any thoughts?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Thanos
>>>>>>> 
>>>>>>>> On 5 Oct 2023, at 14:23, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>> 
>>>>>>>> This works Pierre. Amazing input, thanks a lot!
>>>>>>>> 
>>>>>>>>> On 5 Oct 2023, at 14:17, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>>>>>> 
>>>>>>>>> Not a petsc4py expert here, but you may to try instead:
>>>>>>>>> A_prime = A.ptap(Phi)
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Pierre
>>>>>>>>> 
>>>>>>>>>> On 5 Oct 2023, at 2:02?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Thanks Pierre! So I tried this and got a segmentation fault. Is this supposed to work right off the bat or am I missing sth?
>>>>>>>>>> 
>>>>>>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------
>>>>>>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>>>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
>>>>>>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>>>>>>>>> [0]PETSC ERROR: to get more information on the crash.
>>>>>>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
>>>>>>>>>> Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>>>>>>>>> 
>>>>>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>>>>>> 
>>>>>>>>>> import time
>>>>>>>>>> 
>>>>>>>>>> import numpy as np
>>>>>>>>>> from colorama import Fore
>>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>>>>>> from firedrake.petsc import PETSc
>>>>>>>>>> from mpi4py import MPI
>>>>>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>>>>>> 
>>>>>>>>>> from utilities import (
>>>>>>>>>>     Print,
>>>>>>>>>>     create_petsc_matrix,
>>>>>>>>>>     print_matrix_partitioning,
>>>>>>>>>> )
>>>>>>>>>> 
>>>>>>>>>> nproc = COMM_WORLD.size
>>>>>>>>>> rank = COMM_WORLD.rank
>>>>>>>>>> 
>>>>>>>>>> # --------------------------------------------
>>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>>>>>> #  A' = Phi.T * A * Phi
>>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>>>>>> # --------------------------------------------
>>>>>>>>>> 
>>>>>>>>>> m, k = 11, 7
>>>>>>>>>> # Generate the random numpy matrices
>>>>>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>>>>>> 
>>>>>>>>>> # --------------------------------------------
>>>>>>>>>> # TEST: Galerking projection of numpy matrices A_np and Phi_np
>>>>>>>>>> # --------------------------------------------
>>>>>>>>>> Aprime_np = Phi_np.T @ A_np @ Phi_np
>>>>>>>>>> Print(f"MATRIX Aprime_np [{Aprime_np.shape[0]}x{Aprime_np.shape[1]}]")
>>>>>>>>>> Print(f"{Aprime_np}")
>>>>>>>>>> 
>>>>>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>>>>>> A = create_petsc_matrix(A_np, sparse=False)
>>>>>>>>>> 
>>>>>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>>>>>> Phi = create_petsc_matrix(Phi_np, sparse=False)
>>>>>>>>>> 
>>>>>>>>>> # Create an empty PETSc matrix object to store the result of the PtAP operation.
>>>>>>>>>> # This will hold the result A' = Phi.T * A * Phi after the computation.
>>>>>>>>>> A_prime = create_petsc_matrix(np.zeros((k, k)), sparse=False)
>>>>>>>>>> 
>>>>>>>>>> # Perform the PtAP (Phi Transpose times A times Phi) operation.
>>>>>>>>>> # In mathematical terms, this operation is A' = Phi.T * A * Phi.
>>>>>>>>>> # A_prime will store the result of the operation.
>>>>>>>>>> Phi.PtAP(A, A_prime)
>>>>>>>>>> 
>>>>>>>>>>> On 5 Oct 2023, at 13:22, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> How about using ptap which will use MatPtAP?
>>>>>>>>>>> It will be more efficient (and it will help you bypass the issue).
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Pierre
>>>>>>>>>>> 
>>>>>>>>>>>> On 5 Oct 2023, at 1:18?PM, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Sorry, forgot function create_petsc_matrix()
>>>>>>>>>>>> 
>>>>>>>>>>>> def create_petsc_matrix(input_array sparse=True):
>>>>>>>>>>>>     """Create a PETSc matrix from an input_array
>>>>>>>>>>>> 
>>>>>>>>>>>>     Args:
>>>>>>>>>>>>         input_array (np array): Input array
>>>>>>>>>>>>         partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>>>>>>>>>>>>         sparse (bool, optional): Toggle for sparese or dense. Defaults to True.
>>>>>>>>>>>> 
>>>>>>>>>>>>     Returns:
>>>>>>>>>>>>         PETSc mat: PETSc matrix
>>>>>>>>>>>>     """
>>>>>>>>>>>>     # Check if input_array is 1D and reshape if necessary
>>>>>>>>>>>>     assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>>>>>>>>>>>>     global_rows, global_cols = input_array.shape
>>>>>>>>>>>> 
>>>>>>>>>>>>     size = ((None, global_rows), (global_cols, global_cols))
>>>>>>>>>>>> 
>>>>>>>>>>>>     # Create a sparse or dense matrix based on the 'sparse' argument
>>>>>>>>>>>>     if sparse:
>>>>>>>>>>>>         matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>>>>>>>>>>>>     else:
>>>>>>>>>>>>         matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>>>>>>>>>>>>     matrix.setUp()
>>>>>>>>>>>> 
>>>>>>>>>>>>     local_rows_start, local_rows_end = matrix.getOwnershipRange()
>>>>>>>>>>>> 
>>>>>>>>>>>>     for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>>>>>>>>>>>>         # Calculate the correct row in the array for the current process
>>>>>>>>>>>>         row_in_array = counter + local_rows_start
>>>>>>>>>>>>         matrix.setValues(
>>>>>>>>>>>>             i, range(global_cols), input_array[row_in_array, :], addv=False
>>>>>>>>>>>>         )
>>>>>>>>>>>> 
>>>>>>>>>>>>     # Assembly the matrix to compute the final structure
>>>>>>>>>>>>     matrix.assemblyBegin()
>>>>>>>>>>>>     matrix.assemblyEnd()
>>>>>>>>>>>> 
>>>>>>>>>>>>     return matrix
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 5 Oct 2023, at 13:09, Thanasis Boutsikakis <thanasis.boutsikakis at corintis.com <mailto:thanasis.boutsikakis at corintis.com>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am trying a Galerkin projection (see MFE below) and I cannot get the Phi.transposeMatMult(A, A1) work. The error is
>>>>>>>>>>>>> 
>>>>>>>>>>>>>     Phi.transposeMatMult(A, A1)
>>>>>>>>>>>>>   File "petsc4py/PETSc/Mat.pyx", line 1514, in petsc4py.PETSc.Mat.transposeMatMult
>>>>>>>>>>>>> petsc4py.PETSc.Error: error code 56
>>>>>>>>>>>>> [0] MatTransposeMatMult() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:10135
>>>>>>>>>>>>> [0] MatProduct_Private() at /Users/boutsitron/firedrake/src/petsc/src/mat/interface/matrix.c:9989
>>>>>>>>>>>>> [0] No support for this operation for this object type
>>>>>>>>>>>>> [0] Call MatProductCreate() first
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Do you know if these exposed to petsc4py or maybe there is another way? I cannot get the MFE to work (neither in sequential nor in parallel)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> """Experimenting with PETSc mat-mat multiplication"""
>>>>>>>>>>>>> 
>>>>>>>>>>>>> import time
>>>>>>>>>>>>> 
>>>>>>>>>>>>> import numpy as np
>>>>>>>>>>>>> from colorama import Fore
>>>>>>>>>>>>> from firedrake import COMM_SELF, COMM_WORLD
>>>>>>>>>>>>> from firedrake.petsc import PETSc
>>>>>>>>>>>>> from mpi4py import MPI
>>>>>>>>>>>>> from numpy.testing import assert_array_almost_equal
>>>>>>>>>>>>> 
>>>>>>>>>>>>> from utilities import (
>>>>>>>>>>>>>     Print,
>>>>>>>>>>>>>     create_petsc_matrix,
>>>>>>>>>>>>> )
>>>>>>>>>>>>> 
>>>>>>>>>>>>> nproc = COMM_WORLD.size
>>>>>>>>>>>>> rank = COMM_WORLD.rank
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # --------------------------------------------
>>>>>>>>>>>>> # EXP: Galerkin projection of an mpi PETSc matrix A with an mpi PETSc matrix Phi
>>>>>>>>>>>>> #  A' = Phi.T * A * Phi
>>>>>>>>>>>>> # [k x k] <- [k x m] x [m x m] x [m x k]
>>>>>>>>>>>>> # --------------------------------------------
>>>>>>>>>>>>> 
>>>>>>>>>>>>> m, k = 11, 7
>>>>>>>>>>>>> # Generate the random numpy matrices
>>>>>>>>>>>>> np.random.seed(0)  # sets the seed to 0
>>>>>>>>>>>>> A_np = np.random.randint(low=0, high=6, size=(m, m))
>>>>>>>>>>>>> Phi_np = np.random.randint(low=0, high=6, size=(m, k))
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Create A as an mpi matrix distributed on each process
>>>>>>>>>>>>> A = create_petsc_matrix(A_np)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Create Phi as an mpi matrix distributed on each process
>>>>>>>>>>>>> Phi = create_petsc_matrix(Phi_np)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> A1 = create_petsc_matrix(np.zeros((k, m)))
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Now A1 contains the result of Phi^T * A
>>>>>>>>>>>>> Phi.transposeMatMult(A, A1)
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>> -- Norbert Wiener
>>>>>> 
>>>>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/9d77e7f5/attachment-0001.html>

From erdemguer at proton.me  Wed Oct 11 03:42:14 2023
From: erdemguer at proton.me (erdemguer)
Date: Wed, 11 Oct 2023 08:42:14 +0000
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <CAMYG4GmQjoWw9+dDNA=8=ROhQec-sgBd9pPfA0rVayEudTN97A@mail.gmail.com>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4G=uZNiME8x1-6bYBDs2vL=NGr82Te6EQGjHyfqe2ee9zw@mail.gmail.com>
	<s87lUk8sbdi_CFN4eW2YBzTuL6lWDusURGA4YrxM6hfepdU9cU_4QBSLZzmc4u2TtelEoTSd-1oukho4kCl_qyhL_n1phV_poV6qyfLa5QM=@proton.me>
	<CAMYG4GmQjoWw9+dDNA=8=ROhQec-sgBd9pPfA0rVayEudTN97A@mail.gmail.com>
Message-ID: <Guidds2-aFUR6yWpJSAgb5CkmFqkw9t21m5LK9BzCX7f6MV1Xc7qJNq0VGr3EVHReq7RcBxPfEzwv0dcs7xHkMgAOFaBw-Q-Og4x3Hi5kSA=@proton.me>

Hi again,
Here is my code:
#include <petsc.h>
static char help[] = "dmplex";

int main(int argc, char **argv)
{
PetscCall(PetscInitialize(&argc, &argv, NULL, help));
DM dm, dm_dist;
PetscSection section;
PetscInt cStart, cEndInterior, cEnd, rank;
PetscInt nc[3] = {3, 3, 3};
PetscReal upper[3] = {1, 1, 1};

PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));

DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm);
DMViewFromOptions(dm, NULL, "-dm1_view");
PetscCall(DMSetFromOptions(dm));
DMViewFromOptions(dm, NULL, "-dm2_view");

PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
DMPlexComputeCellTypes(dm);
PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL));
PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
cEndInterior, cEnd);

PetscInt nField = 1, nDof = 3, field = 0;
PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, &section));
PetscSectionSetNumFields(section, nField);
PetscCall(PetscSectionSetChart(section, cStart, cEnd));
for (PetscInt p = cStart; p < cEnd; p++)
{
PetscCall(PetscSectionSetFieldDof(section, p, field, nDof));
PetscCall(PetscSectionSetDof(section, p, nDof));
}

PetscCall(PetscSectionSetUp(section));

DMSetLocalSection(dm, section);
DMViewFromOptions(dm, NULL, "-dm3_view");

DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE);
DMViewFromOptions(dm, NULL, "-dm4_view");
PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist));
if (dm_dist)
{
DMDestroy(&dm);
dm = dm_dist;
}
DMViewFromOptions(dm, NULL, "-dm5_view");
PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
DMPlexComputeCellTypes(dm);
PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL));
PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
cEndInterior, cEnd);

DMDestroy(&dm);
PetscCall(PetscFinalize());}

This codes output is currently (on 2 processors) is:
Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14
Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13
After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24

DMView outputs:
dm1_view (after creation):
DM Object: 2 MPI processes
type: plex
DM_0x84000004_0 in 3 dimensions:
Number of 0-cells per rank: 64 0
Number of 1-cells per rank: 144 0
Number of 2-cells per rank: 108 0
Number of 3-cells per rank: 27 0
Labels:
marker: 1 strata with value/size (1 (218))
Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9))
depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144))

dm2_view (after setfromoptions):
DM Object: 2 MPI processes
type: plex
DM_0x84000004_0 in 3 dimensions:
Number of 0-cells per rank: 40 46
Number of 1-cells per rank: 83 95
Number of 2-cells per rank: 57 64
Number of 3-cells per rank: 13 14
Labels:
depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
marker: 1 strata with value/size (1 (109))
Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))

dm3_view (after setting local section):
DM Object: 2 MPI processes
type: plex
DM_0x84000004_0 in 3 dimensions:
Number of 0-cells per rank: 40 46
Number of 1-cells per rank: 83 95
Number of 2-cells per rank: 57 64
Number of 3-cells per rank: 13 14
Labels:
depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
marker: 1 strata with value/size (1 (109))
Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
Field Field_0: adjacency FEM

dm4_view (after setting adjacency):
DM Object: 2 MPI processes
type: plex
DM_0x84000004_0 in 3 dimensions:
Number of 0-cells per rank: 40 46
Number of 1-cells per rank: 83 95
Number of 2-cells per rank: 57 64
Number of 3-cells per rank: 13 14
Labels:
depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
marker: 1 strata with value/size (1 (109))
Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
Field Field_0: adjacency FVM++

dm5_view (after distribution):
DM Object: Parallel Mesh 2 MPI processes
type: plex
Parallel Mesh in 3 dimensions:
Number of 0-cells per rank: 64 60
Number of 1-cells per rank: 144 133
Number of 2-cells per rank: 108 98
Number of 3-cells per rank: 27 24
Labels:
depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
marker: 1 strata with value/size (1 (218))
Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9))
celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27))
Field Field_0: adjacency FVM++

Thanks,
Guer.

Sent with [Proton Mail](https://proton.me/) secure email.

------- Original Message -------
On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:
>
>> Hi,
>> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine:
>>
>> - DMPlexCreateBoxMesh
>>
>> - DMSetFromOptions
>> - PetscSectionCreate
>> - PetscSectionSetNumFields
>> - PetscSectionSetFieldDof
>>
>> - PetscSectionSetDof
>>
>> - PetscSectionSetUp
>> - DMSetLocalSection
>> - DMSetAdjacency
>> - DMPlexDistribute
>>
>> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells.
>
> Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put
>
> DMViewFromOptions(dm, NULL, "-dm1_view")
>
> with a different string after each call.
>
>> But I couldn't figure out how to decide where the ghost/processor boundary cells start.
>
> Please send the actual code because the above is not specific enough. For example, you will not have
> "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity,
> so each process gets a unique set.
>
> Thanks,
>
> Matt
>
>> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing.
>>
>> Thanks again,
>> Guer.
>>
>> ------- Original Message -------
>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow:
>>>>
>>>> Create a DMPlex mesh from GMSH.
>>>> Reorder it with DMPlexPermute.
>>>> Create necessary pre-processing arrays related to the mesh/problem.
>>>> Create field(s) with multi-dofs.
>>>> Create residual vectors.
>>>> Define a function to calculate the residual for each cell and, use SNES.
>>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic?
>>>
>>> The intention was that there is enough information in the manual to do this.
>>>
>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand.
>>>
>>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>> Thank you.
>>>> Guer.
>>>>
>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>
>>> --
>>>
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>>
>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>
> --
>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/c381079a/attachment-0001.html>

From knepley at gmail.com  Wed Oct 11 06:02:42 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 11 Oct 2023 07:02:42 -0400
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
Message-ID: <CAMYG4Gk+D45fyH4-6DLAqu=wjYBw59EWMk-qrCYh_6vVu647Fw@mail.gmail.com>

On Tue, Oct 10, 2023 at 9:34?PM Brandon Denton via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Good Evening,
>
> I am looking to implement a form of Navier-Stokes with SUPG Stabilization
> and shock capturing using PETSc's FEM infrastructure. In this
> implementation, I need access to the cell's shape function gradients and
> natural coordinate gradients for calculations within the point-wise
> residual calculations. How do I get these quantities at the quadrature
> points? The signatures for fo and f1 don't seem to contain this information.
>

Are you sure you need those? Darsh and I implemented SUPG without that. You
would need local second derivative information, which you can get using
-dm_ds_jet_degree 2. If you check in an example, I can go over it.

  Thanks,

     Matt


> Thank you in advance for your time.
> Brandon
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/362e0aad/attachment.html>

From knepley at gmail.com  Wed Oct 11 06:07:25 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 11 Oct 2023 07:07:25 -0400
Subject: [petsc-users] Compilation failure of PETSc with "The procedure
 name of the INTERFACE block conflicts with a name in the encompassing
 scoping unit"
In-Reply-To: <f6983a26830140a4a1183d8690bceecb@empa.ch>
References: <f6983a26830140a4a1183d8690bceecb@empa.ch>
Message-ID: <CAMYG4G=T-cju4wOjA8ba-bDVQy2=hU2eNdB1hvv7BRNRU_s2Wg@mail.gmail.com>

On Wed, Oct 11, 2023 at 4:22?AM Richter, Roland <Roland.Richter at empa.ch>
wrote:

> Hei,
>
> following my last question I managed to configure PETSc with Intel MPI and
> Intel OneAPI using the following configure-line:
>
>
>
> *./configure --prefix=/media/storage/local_opt/petsc
> --with-scalar-type=complex --with-cc=mpiicc --with-cxx=mpiicpc
> --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC -march=native
> -mavx2" --with-fc=mpiifort --with-pic=true --with-mpi=true
> --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/
> --with-openmp=true --download-hdf5=yes --download-netcdf=yes
> --download-chaco=no --download-metis=yes --download-slepc=yes
> --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes
> --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes
> --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1
> --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost
> --download-opencascade=yes --with-fftw=1
> --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes
> --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1
> --download-muparser=yes --download-p4est=yes --download-sowing=yes
> --download-viennalcl=yes --with-zlib --force=1 --with-clean=1 --with-cuda=0*
>
>
>
> Now, however, compilation fails with the following error:
>
> /home/user/Downloads/git-files/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90(699):
> error #6623: The procedure name of the INTERFACE block conflicts with a
> name in the encompassing scoping unit.   [PCGASMCREATESUBDOMAINS2D]
>
>       subroutine PCGASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,j,z)
>
> -----------------^
>
> /home/user/Downloads/git-files/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90(1199):
> error #6623: The procedure name of the INTERFACE block conflicts with a
> name in the encompassing scoping unit.   [PCASMCREATESUBDOMAINS2D]
>
>       subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z)
>
> -----------------^
>
> I'm on the latest version of origin/main, but can't figure out how to fix
> that issue by myself. Therefore, I'd appreciate additional insight.
>

You have old build files in the tree. We changed the Fortran stubs to be
generated in the PETSC_ARCH tree so that you can build the stubs for
different branches in the same PETSc tree. You have old stubs in the src
tree. You can get rid of these using

  git clean -f -d -x

unless you have your own files in the source tree, in which case you need
to remove the ftn-auto-interfaces directories yourself.

  Thanks,

     Matt


> Thanks!
>
> Regards,
>
> Roland Richter
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/43da6209/attachment.html>

From bldenton at buffalo.edu  Wed Oct 11 07:25:10 2023
From: bldenton at buffalo.edu (Brandon Denton)
Date: Wed, 11 Oct 2023 12:25:10 +0000
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <CAMYG4Gk+D45fyH4-6DLAqu=wjYBw59EWMk-qrCYh_6vVu647Fw@mail.gmail.com>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4Gk+D45fyH4-6DLAqu=wjYBw59EWMk-qrCYh_6vVu647Fw@mail.gmail.com>
Message-ID: <b2766931-4260-4044-8ea5-f5a0273317ef@email.android.com>

I was thinking about trying to implement Ben Kirk's approach to Navier-Stokes (see attached paper; Section 5). His approach uses these quantities to align the orientation of the unstructured element/cell with the fluid velocity to apply the stabilization/upwinding and to detect shocks.

If you have an example of the approach you mentioned, could you please send it over so I can review it?

On Oct 11, 2023 6:02 AM, Matthew Knepley <knepley at gmail.com> wrote:
On Tue, Oct 10, 2023 at 9:34?PM Brandon Denton via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Good Evening,

I am looking to implement a form of Navier-Stokes with SUPG Stabilization and shock capturing using PETSc's FEM infrastructure. In this implementation, I need access to the cell's shape function gradients and natural coordinate gradients for calculations within the point-wise residual calculations. How do I get these quantities at the quadrature points? The signatures for fo and f1 don't seem to contain this information.

Are you sure you need those? Darsh and I implemented SUPG without that. You would need local second derivative information, which you can get using -dm_ds_jet_degree 2. If you check in an example, I can go over it.

  Thanks,

     Matt

Thank you in advance for your time.
Brandon


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/08ecca2a/attachment-0001.html>

From knepley at gmail.com  Wed Oct 11 08:17:36 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 11 Oct 2023 09:17:36 -0400
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <Guidds2-aFUR6yWpJSAgb5CkmFqkw9t21m5LK9BzCX7f6MV1Xc7qJNq0VGr3EVHReq7RcBxPfEzwv0dcs7xHkMgAOFaBw-Q-Og4x3Hi5kSA=@proton.me>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4G=uZNiME8x1-6bYBDs2vL=NGr82Te6EQGjHyfqe2ee9zw@mail.gmail.com>
	<s87lUk8sbdi_CFN4eW2YBzTuL6lWDusURGA4YrxM6hfepdU9cU_4QBSLZzmc4u2TtelEoTSd-1oukho4kCl_qyhL_n1phV_poV6qyfLa5QM=@proton.me>
	<CAMYG4GmQjoWw9+dDNA=8=ROhQec-sgBd9pPfA0rVayEudTN97A@mail.gmail.com>
	<Guidds2-aFUR6yWpJSAgb5CkmFqkw9t21m5LK9BzCX7f6MV1Xc7qJNq0VGr3EVHReq7RcBxPfEzwv0dcs7xHkMgAOFaBw-Q-Og4x3Hi5kSA=@proton.me>
Message-ID: <CAMYG4GkjFNCYEMgQ=Wtr0dEukyU2TOEfOp3nLbVRS5S+d-hrxA@mail.gmail.com>

On Wed, Oct 11, 2023 at 4:42?AM erdemguer <erdemguer at proton.me> wrote:

> Hi again,
>

I see the problem. FV ghosts mean extra boundary cells added in FV methods
using DMPlexCreateGhostCells() in order to impose boundary conditions. They
are not the "ghost" cells for overlapping parallel decompositions. I have
changed your code to give you what you want. It is attached.

  Thanks,

     Matt


> Here is my code:
> #include <petsc.h>
> static char help[] = "dmplex";
>
> int main(int argc, char **argv)
> {
>     PetscCall(PetscInitialize(&argc, &argv, NULL, help));
>     DM dm, dm_dist;
>     PetscSection section;
>     PetscInt cStart, cEndInterior, cEnd, rank;
>     PetscInt nc[3] = {3, 3, 3};
>     PetscReal upper[3] = {1, 1, 1};
>
>     PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));
>
>     DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper,
> NULL, PETSC_TRUE, &dm);
>     DMViewFromOptions(dm, NULL, "-dm1_view");
>     PetscCall(DMSetFromOptions(dm));
>     DMViewFromOptions(dm, NULL, "-dm2_view");
>
>     PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>     DMPlexComputeCellTypes(dm);
>     PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST,
> &cEndInterior, NULL));
>     PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart:
> %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>                 cEndInterior, cEnd);
>
>     PetscInt nField = 1, nDof = 3, field = 0;
>     PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, &section));
>     PetscSectionSetNumFields(section, nField);
>     PetscCall(PetscSectionSetChart(section, cStart, cEnd));
>     for (PetscInt p = cStart; p < cEnd; p++)
>     {
>         PetscCall(PetscSectionSetFieldDof(section, p, field, nDof));
>         PetscCall(PetscSectionSetDof(section, p, nDof));
>     }
>
>     PetscCall(PetscSectionSetUp(section));
>
>     DMSetLocalSection(dm, section);
>     DMViewFromOptions(dm, NULL, "-dm3_view");
>
>     DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE);
>     DMViewFromOptions(dm, NULL, "-dm4_view");
>     PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist));
>     if (dm_dist)
>     {
>         DMDestroy(&dm);
>         dm = dm_dist;
>     }
>     DMViewFromOptions(dm, NULL, "-dm5_view");
>     PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>     DMPlexComputeCellTypes(dm);
>     PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST,
> &cEndInterior, NULL));
>     PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d,
> cEndInterior: %d, cEnd: %d\n", rank, cStart,
>                 cEndInterior, cEnd);
>
>     DMDestroy(&dm);
>     PetscCall(PetscFinalize());
> }
>
> This codes output is currently (on 2 processors) is:
> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14
> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13
> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27
> After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24
>
> DMView outputs:
> dm1_view (after creation):
> DM Object: 2 MPI processes
>   type: plex
> DM_0x84000004_0 in 3 dimensions:
>   Number of 0-cells per rank: 64 0
>   Number of 1-cells per rank: 144 0
>   Number of 2-cells per rank: 108 0
>   Number of 3-cells per rank: 27 0
> Labels:
>   marker: 1 strata with value/size (1 (218))
>   Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9),
> 2 (9))
>   depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>   celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144))
>
> dm2_view (after setfromoptions):
> DM Object: 2 MPI processes
>   type: plex
> DM_0x84000004_0 in 3 dimensions:
>   Number of 0-cells per rank: 40 46
>   Number of 1-cells per rank: 83 95
>   Number of 2-cells per rank: 57 64
>   Number of 3-cells per rank: 13 14
> Labels:
>   depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>   marker: 1 strata with value/size (1 (109))
>   Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>   celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>
> dm3_view (after setting local section):
> DM Object: 2 MPI processes
>   type: plex
> DM_0x84000004_0 in 3 dimensions:
>   Number of 0-cells per rank: 40 46
>   Number of 1-cells per rank: 83 95
>   Number of 2-cells per rank: 57 64
>   Number of 3-cells per rank: 13 14
> Labels:
>   depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>   marker: 1 strata with value/size (1 (109))
>   Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>   celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
> Field Field_0:
>   adjacency FEM
>
> dm4_view (after setting adjacency):
> DM Object: 2 MPI processes
>   type: plex
> DM_0x84000004_0 in 3 dimensions:
>   Number of 0-cells per rank: 40 46
>   Number of 1-cells per rank: 83 95
>   Number of 2-cells per rank: 57 64
>   Number of 3-cells per rank: 13 14
> Labels:
>   depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>   marker: 1 strata with value/size (1 (109))
>   Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>   celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
> Field Field_0:
>   adjacency FVM++
>
> dm5_view (after distribution):
> DM Object: Parallel Mesh 2 MPI processes
>   type: plex
> Parallel Mesh in 3 dimensions:
>   Number of 0-cells per rank: 64 60
>   Number of 1-cells per rank: 144 133
>   Number of 2-cells per rank: 108 98
>   Number of 3-cells per rank: 27 24
> Labels:
>   depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>   marker: 1 strata with value/size (1 (218))
>   Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9),
> 6 (9))
>   celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27))
> Field Field_0:
>   adjacency FVM++
>
> Thanks,
> Guer.
> Sent with Proton Mail <https://proton.me/> secure email.
>
> ------- Original Message -------
> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley <
> knepley at gmail.com> wrote:
>
> On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:
>
>>
>> Hi,
>> Sorry for my late response. I tried with your suggestions and I think I
>> made a progress. But I still got issues. Let me explain my latest mesh
>> routine:
>>
>>
>>    1. DMPlexCreateBoxMesh
>>    2. DMSetFromOptions
>>    3. PetscSectionCreate
>>    4. PetscSectionSetNumFields
>>    5. PetscSectionSetFieldDof
>>    6. PetscSectionSetDof
>>    7. PetscSectionSetUp
>>    8. DMSetLocalSection
>>    9. DMSetAdjacency
>>    10. DMPlexDistribute
>>
>>
>> It's still not working but it's promising, if I call
>> DMPlexGetDepthStratum for cells, I can see that after distribution
>> processors have more cells.
>>
>
> Please send the output of DMPlexView() for each incarnation of the mesh.
> What I do is put
>
> DMViewFromOptions(dm, NULL, "-dm1_view")
>
>
> with a different string after each call.
>
>> But I couldn't figure out how to decide where the ghost/processor
>> boundary cells start.
>>
>
> Please send the actual code because the above is not specific enough. For
> example, you will not have
> "ghost cells" unless you partition with overlap. This is because by
> default cells are the partitioned quantity,
> so each process gets a unique set.
>
> Thanks,
>
> Matt
>
>> In older mails I saw there is a function DMPlexGetHybridBounds but I
>> think that function is deprecated. I tried to use,
>> DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm getting -1
>> as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST,
>> DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling
>> DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed.
>> I think I can calculate the ghost cell indices using cStart/cEnd before &
>> after distribution but I think there is a better way I'm currently missing.
>>
>> Thanks again,
>> Guer.
>>
>> ------- Original Message -------
>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <
>> knepley at gmail.com> wrote:
>>
>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>>> Hi,
>>>
>>> I am currently using DMPlex in my code. It runs serially at the moment,
>>> but I'm interested in adding parallel options. Here is my workflow:
>>>
>>> Create a DMPlex mesh from GMSH.
>>> Reorder it with DMPlexPermute.
>>> Create necessary pre-processing arrays related to the mesh/problem.
>>> Create field(s) with multi-dofs.
>>> Create residual vectors.
>>> Define a function to calculate the residual for each cell and, use SNES.
>>> As you can see, I'm not using FV or FE structures (most examples do).
>>> Now, I'm trying to implement this in parallel using a similar approach.
>>> However, I'm struggling to understand how to create corresponding vectors
>>> and how to obtain index sets for each processor. Is there a tutorial or
>>> paper that covers this topic?
>>>
>>
>> The intention was that there is enough information in the manual to do
>> this.
>>
>> Using PetscFE/PetscFV is not required. However, I strongly encourage you
>> to use PetscSection. Without this, it would be incredibly hard to do what
>> you want. Once the DM has a Section, it can do things like automatically
>> create vectors and matrices for you. It can redistribute them, subset them,
>> etc. The Section describes how dofs are assigned to pieces of the mesh
>> (mesh points). This is in the manual, and there are a few examples that do
>> it by hand.
>>
>> So I suggest changing your code to use PetscSection, and then letting us
>> know if things still do not work.
>>
>> Thanks,
>>
>> Matt
>>
>>> Thank you.
>>> Guer.
>>>
>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/da898bed/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex1.c
Type: application/octet-stream
Size: 3039 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/da898bed/attachment-0001.obj>

From junchao.zhang at gmail.com  Wed Oct 11 09:14:57 2023
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Wed, 11 Oct 2023 09:14:57 -0500
Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses
 switching to COO interface
In-Reply-To: <SA1PR09MB807710D811420766DA7BBFADC6CAA@SA1PR09MB8077.namprd09.prod.outlook.com>
References: <SA1PR09MB80772E375B8C0C24CCE0BD85C6C5A@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp-NBG=CzGEZ-8PVCa3sUiG2erRCKUJG+rP-3cLcN+cVFg@mail.gmail.com>
	<CA+MQGp-tCQhjpVxjqF-zajKGsZNHZFH3v-S-RSdG=ciChu0Oow@mail.gmail.com>
	<CA+MQGp-Srn8rxJsaozPYt+hOge5VJdMuJdwVHB=sxHyCS_KoAA@mail.gmail.com>
	<SA1PR09MB807710D811420766DA7BBFADC6CAA@SA1PR09MB8077.namprd09.prod.outlook.com>
Message-ID: <CA+MQGp8J9hXjNF__TvfcF97sG9VwzSeis=BChEzvQkGGp3X2Rg@mail.gmail.com>

Hi,  Philip,
  Could you try this branch
jczhang/2023-10-05/feature-support-matshift-aijkokkos ?

  Thanks.
--Junchao Zhang


On Thu, Oct 5, 2023 at 4:52?PM Fackler, Philip <facklerpw at ornl.gov> wrote:

> Aha! That makes sense. Thank you.
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Thursday, October 5, 2023 17:29
> *To:* Fackler, Philip <facklerpw at ornl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; Blondel, Sophie <
> sblondel at utk.edu>
> *Subject:* [EXTERNAL] Re: [petsc-users] Unexpected performance losses
> switching to COO interface
>
> Wait a moment, it seems it was because we do not have a GPU implementation
> of MatShift...
> Let me see how to add it.
> --Junchao Zhang
>
>
> On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
> Hi, Philip,
>   I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues()
> instead of the COO interface?  MatSetValues() needs to copy the data from
> device to host and thus is expensive.
>   Do you have profiling results with COO enabled?
>
> [image: Screenshot 2023-10-05 at 10.55.29?AM.png]
>
>
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
> Hi, Philip,
>   I will look into the tarballs and get back to you.
>    Thanks.
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> We finally have xolotl ported to use the new COO interface and the
> aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port
> to our previous version (using MatSetValuesStencil and the default Mat and
> Vec implementations), we expected to see an improvement in performance for
> both the "serial" and "cuda" builds (here I'm referring to the kokkos
> configuration).
>
> Attached are two plots that show timings for three different cases. All of
> these were run on Ascent (the Summit-like training system) with 6 MPI tasks
> (on a single node). The CUDA cases were given one GPU per task (and used
> CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases
> we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent
> as possible.
>
> The performance of RHSJacobian (where the bulk of computation happens in
> xolotl) behaved basically as expected (better than expected in the serial
> build). NE_3 case in CUDA was the only one that performed worse, but not
> surprisingly, since its workload for the GPUs is much smaller. We've still
> got more optimization to do on this.
>
> The real surprise was how much worse the overall solve times were. This
> seems to be due simply to switching to the kokkos-based implementation. I'm
> wondering if there are any changes we can make in configuration or runtime
> arguments to help with PETSc's performance here. Any help looking into this
> would be appreciated.
>
> The tarballs linked here
> <https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA&e=>
> and here
> <https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk&e=>
> are profiling databases which, once extracted, can be viewed with
> hpcviewer. I don't know how helpful that will be, but hopefully it can give
> you some direction.
>
> Thanks for your help,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/804e4faf/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-10-05 at 10.55.29?AM.png
Type: image/png
Size: 144341 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/804e4faf/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-10-05 at 10.55.29?AM.png
Type: image/png
Size: 144341 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/804e4faf/attachment-0003.png>

From balay at mcs.anl.gov  Wed Oct 11 09:28:09 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 11 Oct 2023 09:28:09 -0500 (CDT)
Subject: [petsc-users] Configuration of PETSc with Intel OneAPI and
 Intel MPI fails
In-Reply-To: <c928fe50535a40f482a1ddacdb613a95@empa.ch>
References: <c98fe2ad282047bca93568953c6b058e@empa.ch>
	<3CF831A3-F5DC-4055-9F00-FA7DD7242EBB@petsc.dev>
	<78e0a665-e6fc-4566-4900-6faa2e593c72@mcs.anl.gov>
	<c928fe50535a40f482a1ddacdb613a95@empa.ch>
Message-ID: <9b267e6f-3e92-9492-3851-f2265231bbaa@mcs.anl.gov>

The same docs should be available in https://web.cels.anl.gov/projects/petsc/download/release-snapshots/petsc-with-docs-3.20.0.tar.gz

Satish

On Wed, 11 Oct 2023, Richter, Roland wrote:

> Hei,
> Thank you very much for the answer! I looked it up, but petsc.org seems to
> be a bit unstable here, quite often I can't reach petsc.org. 
> Regards,
> Roland Richter
> 
> -----Urspr?ngliche Nachricht-----
> Von: Satish Balay <balay at mcs.anl.gov> 
> Gesendet: mandag 9. oktober 2023 17:29
> An: Barry Smith <bsmith at petsc.dev>
> Cc: Richter, Roland <Roland.Richter at empa.ch>; petsc-users at mcs.anl.gov
> Betreff: Re: [petsc-users] Configuration of PETSc with Intel OneAPI and
> Intel MPI fails
> 
> Will note - OneAPI MPI usage is documented at
> https://petsc.org/release/install/install/#mpi
> 
> Satish
> 
> On Mon, 9 Oct 2023, Barry Smith wrote:
> 
> > 
> >   Instead of using the mpiicc -cc=icx style use -- with-cc=mpiicc (etc)
> and 
> > 
> > export I_MPI_CC=icx
> > export I_MPI_CXX=icpx
> > export I_MPI_F90=ifx
> > 
> > 
> > > On Oct 9, 2023, at 8:32 AM, Richter, Roland <Roland.Richter at empa.ch>
> wrote:
> > > 
> > > Hei,
> > > I'm currently trying to install PETSc on a server (Ubuntu 22.04) with
> Intel MPI and Intel OneAPI. To combine both, I have to use f. ex. "mpiicc
> -cc=icx" as C-compiler, as described by
> https://stackoverflow.com/a/76362396. Therefore, I adapted the
> configure-line as follow:
> > >  
> > > ./configure --prefix=/media/storage/local_opt/petsc
> --with-scalar-type=complex --with-cc="mpiicc -cc=icx" --with-cxx="mpiicpc
> -cxx=icpx" --CPPFLAGS="-fPIC -march=native -mavx2" --CXXFLAGS="-fPIC
> -march=native -mavx2" --with-fc="mpiifort -fc=ifx" --with-pic=true
> --with-mpi=true
> --with-blaslapack-dir=/opt/intel/oneapi/mkl/latest/lib/intel64/
> --with-openmp=true --download-hdf5=yes --download-netcdf=yes
> --download-chaco=no --download-metis=yes --download-slepc=yes
> --download-suitesparse=yes --download-eigen=yes --download-parmetis=yes
> --download-ptscotch=yes --download-mumps=yes --download-scalapack=yes
> --download-superlu=yes --download-superlu_dist=yes --with-mkl_pardiso=1
> --with-boost=1 --with-boost-dir=/media/storage/local_opt/boost
> --download-opencascade=yes --with-fftw=1
> --with-fftw-dir=/media/storage/local_opt/fftw3 --download-kokkos=yes
> --with-mkl_sparse=1 --with-mkl_cpardiso=1 --with-mkl_sparse_optimize=1
> --download-muparser=no --download-p4est=yes --download-sowing=y
>  es --download-viennalcl=yes --with-zlib --force=1 --with-clean=1
> --with-cuda=1
> > >  
> > > The configuration, however, fails with 
> > >  
> > > The CMAKE_C_COMPILER:
> > >  
> > >     mpiicc -cc=icx
> > >  
> > >   is not a full path and was not found in the PATH
> > >  
> > > for all additional modules which use a cmake-based configuration
> approach (such as OPENCASCADE). How could I solve that problem?
> > >  
> > > Thank you!
> > > Regards,
> > > Roland Richter
> > > <configure.log>
> > 
> > 
> 

From kenneth.c.hall at duke.edu  Wed Oct 11 10:27:03 2023
From: kenneth.c.hall at duke.edu (Kenneth C Hall)
Date: Wed, 11 Oct 2023 15:27:03 +0000
Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and
 T'(lambda)
In-Reply-To: <89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es>
References: <BL0PR05MB480177D5E10088FE9EC11DA4A2CAA@BL0PR05MB4801.namprd05.prod.outlook.com>
	<F1528F80-C92C-43A9-9CB7-75A1D8712100@dsic.upv.es>
	<BL0PR05MB4801BD698F33E63C93E55252A2C9A@BL0PR05MB4801.namprd05.prod.outlook.com>
	<89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es>
Message-ID: <BL0PR05MB48010E7BE4124F38F73144AEA2CCA@BL0PR05MB4801.namprd05.prod.outlook.com>

Jose,

Thanks very much for your help with this. Greatly appreciated. I will look at the MR. Please let me know if you do get the Fortran example working.

Thanks, and best regards,
Kenneth


From: Jose E. Roman <jroman at dsic.upv.es>
Date: Wednesday, October 11, 2023 at 2:41 AM
To: Kenneth C Hall <kenneth.c.hall at duke.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda)
Kenneth,

The MatDuplicate issue should be fixed in the following MR https://urldefense.com/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/6912__;!!OToaGQ!p1tu1lzpyqM4wU-3WRzXN9bH3sFnXjyJvwQZh4PQBG5GNgB472qfxKOASyjxsg23AUQGusU-HpzI855ViaFfRCI$<https://urldefense.com/v3/__https:/gitlab.com/petsc/petsc/-/merge_requests/6912__;!!OToaGQ!p1tu1lzpyqM4wU-3WRzXN9bH3sFnXjyJvwQZh4PQBG5GNgB472qfxKOASyjxsg23AUQGusU-HpzI855ViaFfRCI$>

Note that the NLEIGS solver internally uses MatDuplicate for creating multiple copies of the shell matrix, each one with its own value of lambda. Hence your implementation of the shell matrix is not appropriate, since you have a single global lambda within the module. I have attempted to write a Fortran example that duplicates the lambda correctly (see the MR), but does not work yet.

Jose


> El 6 oct 2023, a las 22:28, Kenneth C Hall <kenneth.c.hall at duke.edu> escribi?:
>
> Jose,
>
> Unfortunately, I was unable to implement the MATOP_DUPLICATE operation in fortran (and I do not know enough c to work in c).  Here is the error message I get:
>
> [0]PETSC ERROR: #1 MatShellSetOperation_Fortran() at /Users/hall/Documents/Fortran_Codes/Packages/petsc/src/mat/impls/shell/ftn-custom/zshellf.c:283
> [0]PETSC ERROR: #2 src/test_nep.f90:62
>
> When I look at zshellf.c, MATOP_DUPLICATE is not one of the supported operations. See below.
>
> Kenneth
>
>
> /**
>  * Subset of MatOperation that is supported by the Fortran wrappers.
>  */
> enum FortranMatOperation {
>   FORTRAN_MATOP_MULT               = 0,
>   FORTRAN_MATOP_MULT_ADD           = 1,
>   FORTRAN_MATOP_MULT_TRANSPOSE     = 2,
>   FORTRAN_MATOP_MULT_TRANSPOSE_ADD = 3,
>   FORTRAN_MATOP_SOR                = 4,
>   FORTRAN_MATOP_TRANSPOSE          = 5,
>   FORTRAN_MATOP_GET_DIAGONAL       = 6,
>   FORTRAN_MATOP_DIAGONAL_SCALE     = 7,
>   FORTRAN_MATOP_ZERO_ENTRIES       = 8,
>   FORTRAN_MATOP_AXPY               = 9,
>   FORTRAN_MATOP_SHIFT              = 10,
>   FORTRAN_MATOP_DIAGONAL_SET       = 11,
>   FORTRAN_MATOP_DESTROY            = 12,
>   FORTRAN_MATOP_VIEW               = 13,
>   FORTRAN_MATOP_CREATE_VECS        = 14,
>   FORTRAN_MATOP_GET_DIAGONAL_BLOCK = 15,
>   FORTRAN_MATOP_COPY               = 16,
>   FORTRAN_MATOP_SCALE              = 17,
>   FORTRAN_MATOP_SET_RANDOM         = 18,
>   FORTRAN_MATOP_ASSEMBLY_BEGIN     = 19,
>   FORTRAN_MATOP_ASSEMBLY_END       = 20,
>   FORTRAN_MATOP_SIZE               = 21
> };
>
>
> From: Jose E. Roman <jroman at dsic.upv.es>
> Date: Friday, October 6, 2023 at 7:01 AM
> To: Kenneth C Hall <kenneth.c.hall at duke.edu>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda)
>
> I am getting an error in a different place than you. I started to debug, but don't have much time at the moment.
> Can you try something? Comparing to ex21.c, I see that a difference that may be relevant is the MATOP_DUPLICATE operation. Can you try defining it for your A matrix?
>
> Note: If you plan to use the NLEIGS solver, there is no need to define the derivative T' so you can skip the call to NEPSetJacobian().
>
> Jose
>
>
> > El 6 oct 2023, a las 0:37, Kenneth C Hall <kenneth.c.hall at duke.edu> escribi?:
> >
> > Hi all,
> >
> > I have a very large eigenvalue problem of the form T(\lambda).x = 0. The eigenvalues appear in a complicated way, and I must use a matrix-free approach to compute the products T.x and T?.x.
> >
> > I am trying to implement in SLEPc/NEP.  To get started, I have defined a much smaller and simpler system of the form
> > A.x - \lambda x = 0 where A is a 10x10 matrix. This is of course a simple standard eigenvalue problem, but I am using it as a surrogate to understand how to use NEP.
> >
> > I have set the problem up using shell matrices (as that is my ultimate goal).  The full code is attached, but here is a smaller snippet of code:
> >
> > !.... Create matrix-free operators for A and B
> >       PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, A, ierr))
> >       PetscCall(MatCreateShell(PETSC_COMM_SELF,n,n,PETSC_DETERMINE,PETSC_DETERMINE, PETSC_NULL_INTEGER, B, ierr))
> >       PetscCall(MatShellSetOperation(A, MATOP_MULT, MatMult_A, ierr))
> >       PetscCall(MatShellSetOperation(B, MATOP_MULT, MatMult_B, ierr))
> >
> > !.... Create nonlinear eigensolver
> >       PetscCall(NEPCreate(PETSC_COMM_SELF, nep, ierr))
> >
> > !.... Set the problem type
> >       PetscCall(NEPSetProblemType(nep, NEP_GENERAL, ierr))
> > !
> > !.... set the solver type
> >       PetscCall(NEPSetType(nep, NEPNLEIGS, ierr))
> > !
> > !.... Set functions and Jacobians for NEP
> >       PetscCall(NEPSetFunction(nep, A, A, MyNEPFunction, PETSC_NULL_INTEGER, ierr))
> >       PetscCall(NEPSetJacobian(nep, B,    MyNEPJacobian, PETSC_NULL_INTEGER, ierr))
> >
> > The code runs, calls MyNEPFunction and MatMult_A multiple times, sweeping over the prescribed RG range, but crashes before it ever calls MyNEPJacobian or MatMult_B.  The NEP viewer and error messages are attached.
> >
> > Any help on getting this problem properly set up would be greatly appreciated.
> >
> > Kenneth Hall
> > ATTACHMENTS:
> > test_nep.f90
> > code_output
> >
> > <code_output><test_nep.f90>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/fd66c99f/attachment-0001.html>

From facklerpw at ornl.gov  Wed Oct 11 10:31:02 2023
From: facklerpw at ornl.gov (Fackler, Philip)
Date: Wed, 11 Oct 2023 15:31:02 +0000
Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses
 switching to COO interface
In-Reply-To: <CA+MQGp8J9hXjNF__TvfcF97sG9VwzSeis=BChEzvQkGGp3X2Rg@mail.gmail.com>
References: <SA1PR09MB80772E375B8C0C24CCE0BD85C6C5A@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp-NBG=CzGEZ-8PVCa3sUiG2erRCKUJG+rP-3cLcN+cVFg@mail.gmail.com>
	<CA+MQGp-tCQhjpVxjqF-zajKGsZNHZFH3v-S-RSdG=ciChu0Oow@mail.gmail.com>
	<CA+MQGp-Srn8rxJsaozPYt+hOge5VJdMuJdwVHB=sxHyCS_KoAA@mail.gmail.com>
	<SA1PR09MB807710D811420766DA7BBFADC6CAA@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp8J9hXjNF__TvfcF97sG9VwzSeis=BChEzvQkGGp3X2Rg@mail.gmail.com>
Message-ID: <SA1PR09MB8077175B463CA8A2270437BEC6CCA@SA1PR09MB8077.namprd09.prod.outlook.com>

I'm on it.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: Wednesday, October 11, 2023 10:14
To: Fackler, Philip <facklerpw at ornl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>; Blondel, Sophie <sblondel at utk.edu>
Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface

Hi,  Philip,
  Could you try this branch jczhang/2023-10-05/feature-support-matshift-aijkokkos ?

  Thanks.
--Junchao Zhang


On Thu, Oct 5, 2023 at 4:52?PM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
Aha! That makes sense. Thank you.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Thursday, October 5, 2023 17:29
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>
Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface

Wait a moment, it seems it was because we do not have a GPU implementation of MatShift...
Let me see how to add it.
--Junchao Zhang


On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Hi, Philip,
  I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() instead of the COO interface?  MatSetValues() needs to copy the data from device to host and thus is expensive.
  Do you have profiling results with COO enabled?

[Screenshot 2023-10-05 at 10.55.29?AM.png]


--Junchao Zhang


On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Hi, Philip,
  I will look into the tarballs and get back to you.
   Thanks.
--Junchao Zhang


On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
We finally have xolotl ported to use the new COO interface and the aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port to our previous version (using MatSetValuesStencil and the default Mat and Vec implementations), we expected to see an improvement in performance for both the "serial" and "cuda" builds (here I'm referring to the kokkos configuration).

Attached are two plots that show timings for three different cases. All of these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on a single node). The CUDA cases were given one GPU per task (and used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible.

The performance of RHSJacobian (where the bulk of computation happens in xolotl) behaved basically as expected (better than expected in the serial build). NE_3 case in CUDA was the only one that performed worse, but not surprisingly, since its workload for the GPUs is much smaller. We've still got more optimization to do on this.

The real surprise was how much worse the overall solve times were. This seems to be due simply to switching to the kokkos-based implementation. I'm wondering if there are any changes we can make in configuration or runtime arguments to help with PETSc's performance here. Any help looking into this would be appreciated.

The tarballs linked here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA&e=> and here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk&e=> are profiling databases which, once extracted, can be viewed with hpcviewer. I don't know how helpful that will be, but hopefully it can give you some direction.

Thanks for your help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/24120b9c/attachment-0001.html>

From jed at jedbrown.org  Wed Oct 11 12:03:12 2023
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 11 Oct 2023 11:03:12 -0600
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <b2766931-4260-4044-8ea5-f5a0273317ef@email.android.com>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4Gk+D45fyH4-6DLAqu=wjYBw59EWMk-qrCYh_6vVu647Fw@mail.gmail.com>
	<b2766931-4260-4044-8ea5-f5a0273317ef@email.android.com>
Message-ID: <87ttqx148f.fsf@jedbrown.org>

I don't see an attachment, but his thesis used conservative variables and defined an effective length scale in a way that seemed to assume constant shape function gradients. I'm not aware of systematic literature comparing the covariant and contravariant length measures on anisotropic meshes, but I believe most people working in the Shakib/Hughes approach use the covariant measure. Our docs have a brief discussion of this choice.

https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet

Matt, I don't understand how the second derivative comes into play as a length measure on anistropic meshes -- the second derivatives can be uniformly zero and yet you still need a length measure.

Brandon Denton via petsc-users <petsc-users at mcs.anl.gov> writes:

> I was thinking about trying to implement Ben Kirk's approach to Navier-Stokes (see attached paper; Section 5). His approach uses these quantities to align the orientation of the unstructured element/cell with the fluid velocity to apply the stabilization/upwinding and to detect shocks.
>
> If you have an example of the approach you mentioned, could you please send it over so I can review it?
>
> On Oct 11, 2023 6:02 AM, Matthew Knepley <knepley at gmail.com> wrote:
> On Tue, Oct 10, 2023 at 9:34?PM Brandon Denton via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
> Good Evening,
>
> I am looking to implement a form of Navier-Stokes with SUPG Stabilization and shock capturing using PETSc's FEM infrastructure. In this implementation, I need access to the cell's shape function gradients and natural coordinate gradients for calculations within the point-wise residual calculations. How do I get these quantities at the quadrature points? The signatures for fo and f1 don't seem to contain this information.
>
> Are you sure you need those? Darsh and I implemented SUPG without that. You would need local second derivative information, which you can get using -dm_ds_jet_degree 2. If you check in an example, I can go over it.
>
>   Thanks,
>
>      Matt
>
> Thank you in advance for your time.
> Brandon
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

From knepley at gmail.com  Wed Oct 11 12:33:54 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 11 Oct 2023 13:33:54 -0400
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <87ttqx148f.fsf@jedbrown.org>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4Gk+D45fyH4-6DLAqu=wjYBw59EWMk-qrCYh_6vVu647Fw@mail.gmail.com>
	<b2766931-4260-4044-8ea5-f5a0273317ef@email.android.com>
	<87ttqx148f.fsf@jedbrown.org>
Message-ID: <CAMYG4GkCvaMZwwZBWr2Z0puYm2rmKZ+sAZzic6xWZr4uHYTLmg@mail.gmail.com>

On Wed, Oct 11, 2023 at 1:03?PM Jed Brown <jed at jedbrown.org> wrote:

> I don't see an attachment, but his thesis used conservative variables and
> defined an effective length scale in a way that seemed to assume constant
> shape function gradients. I'm not aware of systematic literature comparing
> the covariant and contravariant length measures on anisotropic meshes, but
> I believe most people working in the Shakib/Hughes approach use the
> covariant measure. Our docs have a brief discussion of this choice.
>
> https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet
>
> Matt, I don't understand how the second derivative comes into play as a
> length measure on anistropic meshes -- the second derivatives can be
> uniformly zero and yet you still need a length measure.
>

I was talking about the usual SUPG where we just penalize the true residual.

  Matt


> Brandon Denton via petsc-users <petsc-users at mcs.anl.gov> writes:
>
> > I was thinking about trying to implement Ben Kirk's approach to
> Navier-Stokes (see attached paper; Section 5). His approach uses these
> quantities to align the orientation of the unstructured element/cell with
> the fluid velocity to apply the stabilization/upwinding and to detect
> shocks.
> >
> > If you have an example of the approach you mentioned, could you please
> send it over so I can review it?
> >
> > On Oct 11, 2023 6:02 AM, Matthew Knepley <knepley at gmail.com> wrote:
> > On Tue, Oct 10, 2023 at 9:34?PM Brandon Denton via petsc-users <
> petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
> > Good Evening,
> >
> > I am looking to implement a form of Navier-Stokes with SUPG
> Stabilization and shock capturing using PETSc's FEM infrastructure. In this
> implementation, I need access to the cell's shape function gradients and
> natural coordinate gradients for calculations within the point-wise
> residual calculations. How do I get these quantities at the quadrature
> points? The signatures for fo and f1 don't seem to contain this information.
> >
> > Are you sure you need those? Darsh and I implemented SUPG without that.
> You would need local second derivative information, which you can get using
> -dm_ds_jet_degree 2. If you check in an example, I can go over it.
> >
> >   Thanks,
> >
> >      Matt
> >
> > Thank you in advance for your time.
> > Brandon
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/<
> http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/96a6af9a/attachment.html>

From jed at jedbrown.org  Wed Oct 11 12:38:17 2023
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 11 Oct 2023 11:38:17 -0600
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <CAMYG4GkCvaMZwwZBWr2Z0puYm2rmKZ+sAZzic6xWZr4uHYTLmg@mail.gmail.com>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4Gk+D45fyH4-6DLAqu=wjYBw59EWMk-qrCYh_6vVu647Fw@mail.gmail.com>
	<b2766931-4260-4044-8ea5-f5a0273317ef@email.android.com>
	<87ttqx148f.fsf@jedbrown.org>
	<CAMYG4GkCvaMZwwZBWr2Z0puYm2rmKZ+sAZzic6xWZr4uHYTLmg@mail.gmail.com>
Message-ID: <87o7h512ly.fsf@jedbrown.org>

Matthew Knepley <knepley at gmail.com> writes:

> On Wed, Oct 11, 2023 at 1:03?PM Jed Brown <jed at jedbrown.org> wrote:
>
>> I don't see an attachment, but his thesis used conservative variables and
>> defined an effective length scale in a way that seemed to assume constant
>> shape function gradients. I'm not aware of systematic literature comparing
>> the covariant and contravariant length measures on anisotropic meshes, but
>> I believe most people working in the Shakib/Hughes approach use the
>> covariant measure. Our docs have a brief discussion of this choice.
>>
>> https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet
>>
>> Matt, I don't understand how the second derivative comes into play as a
>> length measure on anistropic meshes -- the second derivatives can be
>> uniformly zero and yet you still need a length measure.
>>
>
> I was talking about the usual SUPG where we just penalize the true residual.

I think you're focused on computing the strong diffusive flux (which can be done using second derivatives or by a projection; the latter produces somewhat better results). But you still need a length scale and that's most naturally computed using the derivative of reference coordinates with respect to physical (or equivalently, the associated metric tensor).

From bldenton at buffalo.edu  Wed Oct 11 13:09:01 2023
From: bldenton at buffalo.edu (Brandon Denton)
Date: Wed, 11 Oct 2023 18:09:01 +0000
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <87o7h512ly.fsf@jedbrown.org>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4Gk+D45fyH4-6DLAqu=wjYBw59EWMk-qrCYh_6vVu647Fw@mail.gmail.com>
	<b2766931-4260-4044-8ea5-f5a0273317ef@email.android.com>
	<87ttqx148f.fsf@jedbrown.org>
	<CAMYG4GkCvaMZwwZBWr2Z0puYm2rmKZ+sAZzic6xWZr4uHYTLmg@mail.gmail.com>
	<87o7h512ly.fsf@jedbrown.org>
Message-ID: <PH7PR15MB6058C5CB3D25E85851468B86C1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>

Thank you for the discussion.

Are we agreed then that the derivatives of the natural coordinates are required for the described approach? If so, is this something PETSc can currently do within the point-wise residual functions?

Matt - Thank you for the command line option for the 2nd derivatives. Those will be needed to implement the discussed approach. Specifically in the stabilization and shock capture parameters. (Ref.: B. Kirk's Thesis). What is a good reference for the usual SUPG method you are referencing? I've been looking through my textbooks but haven't found a good reference.

Jed - Thank you for the link. I will review the information on it.

Sorry about the attachment. I will upload it to this thread later (I'm at work right now and I can't do it from here).
________________________________
From: Jed Brown <jed at jedbrown.org>
Sent: Wednesday, October 11, 2023 1:38 PM
To: Matthew Knepley <knepley at gmail.com>
Cc: Brandon Denton <bldenton at buffalo.edu>; petsc-users <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization

Matthew Knepley <knepley at gmail.com> writes:

> On Wed, Oct 11, 2023 at 1:03?PM Jed Brown <jed at jedbrown.org> wrote:
>
>> I don't see an attachment, but his thesis used conservative variables and
>> defined an effective length scale in a way that seemed to assume constant
>> shape function gradients. I'm not aware of systematic literature comparing
>> the covariant and contravariant length measures on anisotropic meshes, but
>> I believe most people working in the Shakib/Hughes approach use the
>> covariant measure. Our docs have a brief discussion of this choice.
>>
>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibceed.org%2Fen%2Flatest%2Fexamples%2Ffluids%2F%23equation-eq-peclet&data=05%7C01%7Cbldenton%40buffalo.edu%7Cd9372f934b26455371a708dbca80dc8e%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638326427028053956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=skMsKDmpBxiaXtBSqhsyckvVpTOkGqDsNJIYo22Ywps%3D&reserved=0<https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet>
>>
>> Matt, I don't understand how the second derivative comes into play as a
>> length measure on anistropic meshes -- the second derivatives can be
>> uniformly zero and yet you still need a length measure.
>>
>
> I was talking about the usual SUPG where we just penalize the true residual.

I think you're focused on computing the strong diffusive flux (which can be done using second derivatives or by a projection; the latter produces somewhat better results). But you still need a length scale and that's most naturally computed using the derivative of reference coordinates with respect to physical (or equivalently, the associated metric tensor).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/6af52a0d/attachment.html>

From knepley at gmail.com  Wed Oct 11 14:13:40 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 11 Oct 2023 15:13:40 -0400
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <PH7PR15MB6058C5CB3D25E85851468B86C1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4Gk+D45fyH4-6DLAqu=wjYBw59EWMk-qrCYh_6vVu647Fw@mail.gmail.com>
	<b2766931-4260-4044-8ea5-f5a0273317ef@email.android.com>
	<87ttqx148f.fsf@jedbrown.org>
	<CAMYG4GkCvaMZwwZBWr2Z0puYm2rmKZ+sAZzic6xWZr4uHYTLmg@mail.gmail.com>
	<87o7h512ly.fsf@jedbrown.org>
	<PH7PR15MB6058C5CB3D25E85851468B86C1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
Message-ID: <CAMYG4G=hoeAhzpuTBrMHiZ9ZrWztP4-Ssuy_khXxN4wYD8uD2Q@mail.gmail.com>

On Wed, Oct 11, 2023 at 2:09?PM Brandon Denton <bldenton at buffalo.edu> wrote:

> Thank you for the discussion.
>
> Are we agreed then that the derivatives of the natural coordinates are
> required for the described approach? If so, is this something PETSc can
> currently do within the point-wise residual functions?
>

I am not sure what natural coordinates are. Do we just mean the Jacobian,
derivatives of the map between reference and real coordinates? If so, yes
the Jacobian is available. Right now I do not pass it
directly, but passing it is easy.

  Thanks,

     Matt


> Matt - Thank you for the command line option for the 2nd derivatives.
> Those will be needed to implement the discussed approach. Specifically in
> the stabilization and shock capture parameters. (Ref.: B. Kirk's Thesis).
> What is a good reference for the usual SUPG method you are referencing?
> I've been looking through my textbooks but haven't found a good reference.
>
> Jed - Thank you for the link. I will review the information on it.
>
> Sorry about the attachment. I will upload it to this thread later (I'm at
> work right now and I can't do it from here).
> ------------------------------
> *From:* Jed Brown <jed at jedbrown.org>
> *Sent:* Wednesday, October 11, 2023 1:38 PM
> *To:* Matthew Knepley <knepley at gmail.com>
> *Cc:* Brandon Denton <bldenton at buffalo.edu>; petsc-users <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] FEM Implementation of NS with SUPG
> Stabilization
>
> Matthew Knepley <knepley at gmail.com> writes:
>
> > On Wed, Oct 11, 2023 at 1:03?PM Jed Brown <jed at jedbrown.org> wrote:
> >
> >> I don't see an attachment, but his thesis used conservative variables
> and
> >> defined an effective length scale in a way that seemed to assume
> constant
> >> shape function gradients. I'm not aware of systematic literature
> comparing
> >> the covariant and contravariant length measures on anisotropic meshes,
> but
> >> I believe most people working in the Shakib/Hughes approach use the
> >> covariant measure. Our docs have a brief discussion of this choice.
> >>
> >>
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibceed.org%2Fen%2Flatest%2Fexamples%2Ffluids%2F%23equation-eq-peclet&data=05%7C01%7Cbldenton%40buffalo.edu%7Cd9372f934b26455371a708dbca80dc8e%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638326427028053956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=skMsKDmpBxiaXtBSqhsyckvVpTOkGqDsNJIYo22Ywps%3D&reserved=0
> <https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet>
> >>
> >> Matt, I don't understand how the second derivative comes into play as a
> >> length measure on anistropic meshes -- the second derivatives can be
> >> uniformly zero and yet you still need a length measure.
> >>
> >
> > I was talking about the usual SUPG where we just penalize the true
> residual.
>
> I think you're focused on computing the strong diffusive flux (which can
> be done using second derivatives or by a projection; the latter produces
> somewhat better results). But you still need a length scale and that's most
> naturally computed using the derivative of reference coordinates with
> respect to physical (or equivalently, the associated metric tensor).
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/09a95247/attachment-0001.html>

From bldenton at buffalo.edu  Wed Oct 11 15:14:16 2023
From: bldenton at buffalo.edu (Brandon Denton)
Date: Wed, 11 Oct 2023 20:14:16 +0000
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <CAMYG4G=hoeAhzpuTBrMHiZ9ZrWztP4-Ssuy_khXxN4wYD8uD2Q@mail.gmail.com>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4Gk+D45fyH4-6DLAqu=wjYBw59EWMk-qrCYh_6vVu647Fw@mail.gmail.com>
	<b2766931-4260-4044-8ea5-f5a0273317ef@email.android.com>
	<87ttqx148f.fsf@jedbrown.org>
	<CAMYG4GkCvaMZwwZBWr2Z0puYm2rmKZ+sAZzic6xWZr4uHYTLmg@mail.gmail.com>
	<87o7h512ly.fsf@jedbrown.org>
	<PH7PR15MB6058C5CB3D25E85851468B86C1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4G=hoeAhzpuTBrMHiZ9ZrWztP4-Ssuy_khXxN4wYD8uD2Q@mail.gmail.com>
Message-ID: <PH7PR15MB60589162C609AC46D6860D18C1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>

By natural coordinates, I am referring to the reference element coordinates. Usually these are represented as (xi, eta, zeta) in the literature.

Yes. I would like to have the Jacobian and the derivatives of the map available within PetscDSSetResidual() f0 and f1 functions.  I believe DMPlexComputeCellGeometryFEM() function provides this information. Is there a way to get the cell, shape functions as well? It not, can we talk about this more? I would like to understand how the shape functions are addressed within PETSc. Dr. Kirk's approach uses the shape function gradients in its SUPG parameter. I'd love to talk with you about this is more detail.


________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Wednesday, October 11, 2023 3:13 PM
To: Brandon Denton <bldenton at buffalo.edu>
Cc: Jed Brown <jed at jedbrown.org>; petsc-users <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization

On Wed, Oct 11, 2023 at 2:09?PM Brandon Denton <bldenton at buffalo.edu<mailto:bldenton at buffalo.edu>> wrote:
Thank you for the discussion.

Are we agreed then that the derivatives of the natural coordinates are required for the described approach? If so, is this something PETSc can currently do within the point-wise residual functions?

I am not sure what natural coordinates are. Do we just mean the Jacobian, derivatives of the map between reference and real coordinates? If so, yes the Jacobian is available. Right now I do not pass it
directly, but passing it is easy.

  Thanks,

     Matt

Matt - Thank you for the command line option for the 2nd derivatives. Those will be needed to implement the discussed approach. Specifically in the stabilization and shock capture parameters. (Ref.: B. Kirk's Thesis). What is a good reference for the usual SUPG method you are referencing? I've been looking through my textbooks but haven't found a good reference.

Jed - Thank you for the link. I will review the information on it.

Sorry about the attachment. I will upload it to this thread later (I'm at work right now and I can't do it from here).
________________________________
From: Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>>
Sent: Wednesday, October 11, 2023 1:38 PM
To: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Cc: Brandon Denton <bldenton at buffalo.edu<mailto:bldenton at buffalo.edu>>; petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization

Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> writes:

> On Wed, Oct 11, 2023 at 1:03?PM Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>> wrote:
>
>> I don't see an attachment, but his thesis used conservative variables and
>> defined an effective length scale in a way that seemed to assume constant
>> shape function gradients. I'm not aware of systematic literature comparing
>> the covariant and contravariant length measures on anisotropic meshes, but
>> I believe most people working in the Shakib/Hughes approach use the
>> covariant measure. Our docs have a brief discussion of this choice.
>>
>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibceed.org%2Fen%2Flatest%2Fexamples%2Ffluids%2F%23equation-eq-peclet&data=05%7C01%7Cbldenton%40buffalo.edu%7Cd9372f934b26455371a708dbca80dc8e%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638326427028053956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=skMsKDmpBxiaXtBSqhsyckvVpTOkGqDsNJIYo22Ywps%3D&reserved=0<https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet>
>>
>> Matt, I don't understand how the second derivative comes into play as a
>> length measure on anistropic meshes -- the second derivatives can be
>> uniformly zero and yet you still need a length measure.
>>
>
> I was talking about the usual SUPG where we just penalize the true residual.

I think you're focused on computing the strong diffusive flux (which can be done using second derivatives or by a projection; the latter produces somewhat better results). But you still need a length scale and that's most naturally computed using the derivative of reference coordinates with respect to physical (or equivalently, the associated metric tensor).


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/a5a31ea0/attachment.html>

From erdemguer at proton.me  Wed Oct 11 16:59:20 2023
From: erdemguer at proton.me (erdemguer)
Date: Wed, 11 Oct 2023 21:59:20 +0000
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <CAMYG4GkjFNCYEMgQ=Wtr0dEukyU2TOEfOp3nLbVRS5S+d-hrxA@mail.gmail.com>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4G=uZNiME8x1-6bYBDs2vL=NGr82Te6EQGjHyfqe2ee9zw@mail.gmail.com>
	<s87lUk8sbdi_CFN4eW2YBzTuL6lWDusURGA4YrxM6hfepdU9cU_4QBSLZzmc4u2TtelEoTSd-1oukho4kCl_qyhL_n1phV_poV6qyfLa5QM=@proton.me>
	<CAMYG4GmQjoWw9+dDNA=8=ROhQec-sgBd9pPfA0rVayEudTN97A@mail.gmail.com>
	<Guidds2-aFUR6yWpJSAgb5CkmFqkw9t21m5LK9BzCX7f6MV1Xc7qJNq0VGr3EVHReq7RcBxPfEzwv0dcs7xHkMgAOFaBw-Q-Og4x3Hi5kSA=@proton.me>
	<CAMYG4GkjFNCYEMgQ=Wtr0dEukyU2TOEfOp3nLbVRS5S+d-hrxA@mail.gmail.com>
Message-ID: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me>

Thank you! That's exactly what I need.

Sent with [Proton Mail](https://proton.me/) secure email.

------- Original Message -------
On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Wed, Oct 11, 2023 at 4:42?AM erdemguer <erdemguer at proton.me> wrote:
>
>> Hi again,
>
> I see the problem. FV ghosts mean extra boundary cells added in FV methods using DMPlexCreateGhostCells() in order to impose boundary conditions. They are not the "ghost" cells for overlapping parallel decompositions. I have changed your code to give you what you want. It is attached.
>
> Thanks,
>
> Matt
>
>> Here is my code:
>> #include <petsc.h>
>> static char help[] = "dmplex";
>>
>> int main(int argc, char **argv)
>> {
>> PetscCall(PetscInitialize(&argc, &argv, NULL, help));
>> DM dm, dm_dist;
>> PetscSection section;
>> PetscInt cStart, cEndInterior, cEnd, rank;
>> PetscInt nc[3] = {3, 3, 3};
>> PetscReal upper[3] = {1, 1, 1};
>>
>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));
>>
>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm);
>> DMViewFromOptions(dm, NULL, "-dm1_view");
>> PetscCall(DMSetFromOptions(dm));
>> DMViewFromOptions(dm, NULL, "-dm2_view");
>>
>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>> DMPlexComputeCellTypes(dm);
>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL));
>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>> cEndInterior, cEnd);
>>
>> PetscInt nField = 1, nDof = 3, field = 0;
>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, &section));
>> PetscSectionSetNumFields(section, nField);
>> PetscCall(PetscSectionSetChart(section, cStart, cEnd));
>> for (PetscInt p = cStart; p < cEnd; p++)
>> {
>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof));
>> PetscCall(PetscSectionSetDof(section, p, nDof));
>> }
>>
>> PetscCall(PetscSectionSetUp(section));
>>
>> DMSetLocalSection(dm, section);
>> DMViewFromOptions(dm, NULL, "-dm3_view");
>>
>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE);
>> DMViewFromOptions(dm, NULL, "-dm4_view");
>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist));
>> if (dm_dist)
>> {
>> DMDestroy(&dm);
>> dm = dm_dist;
>> }
>> DMViewFromOptions(dm, NULL, "-dm5_view");
>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>> DMPlexComputeCellTypes(dm);
>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL));
>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>> cEndInterior, cEnd);
>>
>> DMDestroy(&dm);
>> PetscCall(PetscFinalize());}
>>
>> This codes output is currently (on 2 processors) is:
>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14
>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13
>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24
>>
>> DMView outputs:
>> dm1_view (after creation):
>> DM Object: 2 MPI processes
>> type: plex
>> DM_0x84000004_0 in 3 dimensions:
>> Number of 0-cells per rank: 64 0
>> Number of 1-cells per rank: 144 0
>> Number of 2-cells per rank: 108 0
>> Number of 3-cells per rank: 27 0
>> Labels:
>> marker: 1 strata with value/size (1 (218))
>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9))
>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144))
>>
>> dm2_view (after setfromoptions):
>> DM Object: 2 MPI processes
>> type: plex
>> DM_0x84000004_0 in 3 dimensions:
>> Number of 0-cells per rank: 40 46
>> Number of 1-cells per rank: 83 95
>> Number of 2-cells per rank: 57 64
>> Number of 3-cells per rank: 13 14
>> Labels:
>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>> marker: 1 strata with value/size (1 (109))
>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>
>> dm3_view (after setting local section):
>> DM Object: 2 MPI processes
>> type: plex
>> DM_0x84000004_0 in 3 dimensions:
>> Number of 0-cells per rank: 40 46
>> Number of 1-cells per rank: 83 95
>> Number of 2-cells per rank: 57 64
>> Number of 3-cells per rank: 13 14
>> Labels:
>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>> marker: 1 strata with value/size (1 (109))
>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>> Field Field_0: adjacency FEM
>>
>> dm4_view (after setting adjacency):
>> DM Object: 2 MPI processes
>> type: plex
>> DM_0x84000004_0 in 3 dimensions:
>> Number of 0-cells per rank: 40 46
>> Number of 1-cells per rank: 83 95
>> Number of 2-cells per rank: 57 64
>> Number of 3-cells per rank: 13 14
>> Labels:
>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>> marker: 1 strata with value/size (1 (109))
>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>> Field Field_0: adjacency FVM++
>>
>> dm5_view (after distribution):
>> DM Object: Parallel Mesh 2 MPI processes
>> type: plex
>> Parallel Mesh in 3 dimensions:
>> Number of 0-cells per rank: 64 60
>> Number of 1-cells per rank: 144 133
>> Number of 2-cells per rank: 108 98
>> Number of 3-cells per rank: 27 24
>> Labels:
>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>> marker: 1 strata with value/size (1 (218))
>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9))
>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27))
>> Field Field_0: adjacency FVM++
>>
>> Thanks,
>> Guer.
>>
>> Sent with [Proton Mail](https://proton.me/) secure email.
>>
>> ------- Original Message -------
>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:
>>>
>>>> Hi,
>>>> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine:
>>>>
>>>> - DMPlexCreateBoxMesh
>>>>
>>>> - DMSetFromOptions
>>>> - PetscSectionCreate
>>>> - PetscSectionSetNumFields
>>>> - PetscSectionSetFieldDof
>>>>
>>>> - PetscSectionSetDof
>>>>
>>>> - PetscSectionSetUp
>>>> - DMSetLocalSection
>>>> - DMSetAdjacency
>>>> - DMPlexDistribute
>>>>
>>>> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells.
>>>
>>> Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put
>>>
>>> DMViewFromOptions(dm, NULL, "-dm1_view")
>>>
>>> with a different string after each call.
>>>
>>>> But I couldn't figure out how to decide where the ghost/processor boundary cells start.
>>>
>>> Please send the actual code because the above is not specific enough. For example, you will not have
>>> "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity,
>>> so each process gets a unique set.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing.
>>>>
>>>> Thanks again,
>>>> Guer.
>>>>
>>>> ------- Original Message -------
>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>
>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow:
>>>>>>
>>>>>> Create a DMPlex mesh from GMSH.
>>>>>> Reorder it with DMPlexPermute.
>>>>>> Create necessary pre-processing arrays related to the mesh/problem.
>>>>>> Create field(s) with multi-dofs.
>>>>>> Create residual vectors.
>>>>>> Define a function to calculate the residual for each cell and, use SNES.
>>>>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic?
>>>>>
>>>>> The intention was that there is enough information in the manual to do this.
>>>>>
>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand.
>>>>>
>>>>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Matt
>>>>>
>>>>>> Thank you.
>>>>>> Guer.
>>>>>>
>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>
>>>>> --
>>>>>
>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>>
>>> --
>>>
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>>
>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>
> --
>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/2eb88e26/attachment-0001.html>

From knepley at gmail.com  Wed Oct 11 19:07:32 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 11 Oct 2023 20:07:32 -0400
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <PH7PR15MB60589162C609AC46D6860D18C1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4Gk+D45fyH4-6DLAqu=wjYBw59EWMk-qrCYh_6vVu647Fw@mail.gmail.com>
	<b2766931-4260-4044-8ea5-f5a0273317ef@email.android.com>
	<87ttqx148f.fsf@jedbrown.org>
	<CAMYG4GkCvaMZwwZBWr2Z0puYm2rmKZ+sAZzic6xWZr4uHYTLmg@mail.gmail.com>
	<87o7h512ly.fsf@jedbrown.org>
	<PH7PR15MB6058C5CB3D25E85851468B86C1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4G=hoeAhzpuTBrMHiZ9ZrWztP4-Ssuy_khXxN4wYD8uD2Q@mail.gmail.com>
	<PH7PR15MB60589162C609AC46D6860D18C1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
Message-ID: <CAMYG4G=0sk+tQDboD+ZDx4sfg7pWp-_qwyytukOByoJpgv5d0Q@mail.gmail.com>

On Wed, Oct 11, 2023 at 4:15?PM Brandon Denton <bldenton at buffalo.edu> wrote:

> By natural coordinates, I am referring to the reference element
> coordinates. Usually these are represented as (xi, eta, zeta) in the
> literature.
>
> Yes. I would like to have the Jacobian and the derivatives of the map
> available within PetscDSSetResidual() f0 and f1 functions.
>

Yes, we can get these passed an aux data.


>   I believe DMPlexComputeCellGeometryFEM() function provides this
> information. Is there a way to get the cell, shape functions as well? It
> not, can we talk about this more? I would like to understand how the shape
> functions are addressed within PETSc. Dr. Kirk's approach uses the shape
> function gradients in its SUPG parameter. I'd love to talk with you about
> this is more detail.
>

There should be a way to formulate this in a basis independent way.  I
would much prefer that to
explicit inclusion of the basis.

  Thanks,

     Matt


> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* Wednesday, October 11, 2023 3:13 PM
> *To:* Brandon Denton <bldenton at buffalo.edu>
> *Cc:* Jed Brown <jed at jedbrown.org>; petsc-users <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] FEM Implementation of NS with SUPG
> Stabilization
>
> On Wed, Oct 11, 2023 at 2:09?PM Brandon Denton <bldenton at buffalo.edu>
> wrote:
>
> Thank you for the discussion.
>
> Are we agreed then that the derivatives of the natural coordinates are
> required for the described approach? If so, is this something PETSc can
> currently do within the point-wise residual functions?
>
>
> I am not sure what natural coordinates are. Do we just mean the Jacobian,
> derivatives of the map between reference and real coordinates? If so, yes
> the Jacobian is available. Right now I do not pass it
> directly, but passing it is easy.
>
>   Thanks,
>
>      Matt
>
>
> Matt - Thank you for the command line option for the 2nd derivatives.
> Those will be needed to implement the discussed approach. Specifically in
> the stabilization and shock capture parameters. (Ref.: B. Kirk's Thesis).
> What is a good reference for the usual SUPG method you are referencing?
> I've been looking through my textbooks but haven't found a good reference.
>
> Jed - Thank you for the link. I will review the information on it.
>
> Sorry about the attachment. I will upload it to this thread later (I'm at
> work right now and I can't do it from here).
> ------------------------------
> *From:* Jed Brown <jed at jedbrown.org>
> *Sent:* Wednesday, October 11, 2023 1:38 PM
> *To:* Matthew Knepley <knepley at gmail.com>
> *Cc:* Brandon Denton <bldenton at buffalo.edu>; petsc-users <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] FEM Implementation of NS with SUPG
> Stabilization
>
> Matthew Knepley <knepley at gmail.com> writes:
>
> > On Wed, Oct 11, 2023 at 1:03?PM Jed Brown <jed at jedbrown.org> wrote:
> >
> >> I don't see an attachment, but his thesis used conservative variables
> and
> >> defined an effective length scale in a way that seemed to assume
> constant
> >> shape function gradients. I'm not aware of systematic literature
> comparing
> >> the covariant and contravariant length measures on anisotropic meshes,
> but
> >> I believe most people working in the Shakib/Hughes approach use the
> >> covariant measure. Our docs have a brief discussion of this choice.
> >>
> >>
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibceed.org%2Fen%2Flatest%2Fexamples%2Ffluids%2F%23equation-eq-peclet&data=05%7C01%7Cbldenton%40buffalo.edu%7Cd9372f934b26455371a708dbca80dc8e%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638326427028053956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=skMsKDmpBxiaXtBSqhsyckvVpTOkGqDsNJIYo22Ywps%3D&reserved=0
> <https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet>
> >>
> >> Matt, I don't understand how the second derivative comes into play as a
> >> length measure on anistropic meshes -- the second derivatives can be
> >> uniformly zero and yet you still need a length measure.
> >>
> >
> > I was talking about the usual SUPG where we just penalize the true
> residual.
>
> I think you're focused on computing the strong diffusive flux (which can
> be done using second derivatives or by a projection; the latter produces
> somewhat better results). But you still need a length scale and that's most
> naturally computed using the derivative of reference coordinates with
> respect to physical (or equivalently, the associated metric tensor).
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/db17baa4/attachment.html>

From bldenton at buffalo.edu  Wed Oct 11 22:44:10 2023
From: bldenton at buffalo.edu (Brandon Denton)
Date: Thu, 12 Oct 2023 03:44:10 +0000
Subject: [petsc-users] FEM Implementation of NS with SUPG Stabilization
In-Reply-To: <CAMYG4G=0sk+tQDboD+ZDx4sfg7pWp-_qwyytukOByoJpgv5d0Q@mail.gmail.com>
References: <PH7PR15MB6058B4F06BBBA180D17EFCEAC1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4Gk+D45fyH4-6DLAqu=wjYBw59EWMk-qrCYh_6vVu647Fw@mail.gmail.com>
	<b2766931-4260-4044-8ea5-f5a0273317ef@email.android.com>
	<87ttqx148f.fsf@jedbrown.org>
	<CAMYG4GkCvaMZwwZBWr2Z0puYm2rmKZ+sAZzic6xWZr4uHYTLmg@mail.gmail.com>
	<87o7h512ly.fsf@jedbrown.org>
	<PH7PR15MB6058C5CB3D25E85851468B86C1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4G=hoeAhzpuTBrMHiZ9ZrWztP4-Ssuy_khXxN4wYD8uD2Q@mail.gmail.com>
	<PH7PR15MB60589162C609AC46D6860D18C1CCA@PH7PR15MB6058.namprd15.prod.outlook.com>
	<CAMYG4G=0sk+tQDboD+ZDx4sfg7pWp-_qwyytukOByoJpgv5d0Q@mail.gmail.com>
Message-ID: <PH7PR15MB6058C41BA590F0E221070BBEC1D3A@PH7PR15MB6058.namprd15.prod.outlook.com>

How exactly does the aux data work? What is typically available there? Is it something the user can populate?


________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Wednesday, October 11, 2023 8:07 PM
To: Brandon Denton <bldenton at buffalo.edu>
Cc: Jed Brown <jed at jedbrown.org>; petsc-users <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization

On Wed, Oct 11, 2023 at 4:15?PM Brandon Denton <bldenton at buffalo.edu<mailto:bldenton at buffalo.edu>> wrote:
By natural coordinates, I am referring to the reference element coordinates. Usually these are represented as (xi, eta, zeta) in the literature.

Yes. I would like to have the Jacobian and the derivatives of the map available within PetscDSSetResidual() f0 and f1 functions.

Yes, we can get these passed an aux data.

  I believe DMPlexComputeCellGeometryFEM() function provides this information. Is there a way to get the cell, shape functions as well? It not, can we talk about this more? I would like to understand how the shape functions are addressed within PETSc. Dr. Kirk's approach uses the shape function gradients in its SUPG parameter. I'd love to talk with you about this is more detail.

There should be a way to formulate this in a basis independent way.  I would much prefer that to
explicit inclusion of the basis.

  Thanks,

     Matt

From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: Wednesday, October 11, 2023 3:13 PM
To: Brandon Denton <bldenton at buffalo.edu<mailto:bldenton at buffalo.edu>>
Cc: Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>>; petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization

On Wed, Oct 11, 2023 at 2:09?PM Brandon Denton <bldenton at buffalo.edu<mailto:bldenton at buffalo.edu>> wrote:
Thank you for the discussion.

Are we agreed then that the derivatives of the natural coordinates are required for the described approach? If so, is this something PETSc can currently do within the point-wise residual functions?

I am not sure what natural coordinates are. Do we just mean the Jacobian, derivatives of the map between reference and real coordinates? If so, yes the Jacobian is available. Right now I do not pass it
directly, but passing it is easy.

  Thanks,

     Matt

Matt - Thank you for the command line option for the 2nd derivatives. Those will be needed to implement the discussed approach. Specifically in the stabilization and shock capture parameters. (Ref.: B. Kirk's Thesis). What is a good reference for the usual SUPG method you are referencing? I've been looking through my textbooks but haven't found a good reference.

Jed - Thank you for the link. I will review the information on it.

Sorry about the attachment. I will upload it to this thread later (I'm at work right now and I can't do it from here).
________________________________
From: Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>>
Sent: Wednesday, October 11, 2023 1:38 PM
To: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Cc: Brandon Denton <bldenton at buffalo.edu<mailto:bldenton at buffalo.edu>>; petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization

Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> writes:

> On Wed, Oct 11, 2023 at 1:03?PM Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>> wrote:
>
>> I don't see an attachment, but his thesis used conservative variables and
>> defined an effective length scale in a way that seemed to assume constant
>> shape function gradients. I'm not aware of systematic literature comparing
>> the covariant and contravariant length measures on anisotropic meshes, but
>> I believe most people working in the Shakib/Hughes approach use the
>> covariant measure. Our docs have a brief discussion of this choice.
>>
>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibceed.org%2Fen%2Flatest%2Fexamples%2Ffluids%2F%23equation-eq-peclet&data=05%7C01%7Cbldenton%40buffalo.edu%7Cd9372f934b26455371a708dbca80dc8e%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638326427028053956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=skMsKDmpBxiaXtBSqhsyckvVpTOkGqDsNJIYo22Ywps%3D&reserved=0<https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet>
>>
>> Matt, I don't understand how the second derivative comes into play as a
>> length measure on anistropic meshes -- the second derivatives can be
>> uniformly zero and yet you still need a length measure.
>>
>
> I was talking about the usual SUPG where we just penalize the true residual.

I think you're focused on computing the strong diffusive flux (which can be done using second derivatives or by a projection; the latter produces somewhat better results). But you still need a length scale and that's most naturally computed using the derivative of reference coordinates with respect to physical (or equivalently, the associated metric tensor).


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231012/9525323e/attachment-0001.html>

From jroman at dsic.upv.es  Thu Oct 12 13:12:08 2023
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Thu, 12 Oct 2023 20:12:08 +0200
Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and
 T'(lambda)
In-Reply-To: <BL0PR05MB48010E7BE4124F38F73144AEA2CCA@BL0PR05MB4801.namprd05.prod.outlook.com>
References: <BL0PR05MB480177D5E10088FE9EC11DA4A2CAA@BL0PR05MB4801.namprd05.prod.outlook.com>
	<F1528F80-C92C-43A9-9CB7-75A1D8712100@dsic.upv.es>
	<BL0PR05MB4801BD698F33E63C93E55252A2C9A@BL0PR05MB4801.namprd05.prod.outlook.com>
	<89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es>
	<BL0PR05MB48010E7BE4124F38F73144AEA2CCA@BL0PR05MB4801.namprd05.prod.outlook.com>
Message-ID: <442B3841-B668-4185-9C6F-D03CA481CA26@dsic.upv.es>

I am attaching your example modified with the context stuff.

With the PETSc branch that I indicated, now it works with NLEIGS, for instance:

$ ./test_nep -nep_nleigs_ksp_type gmres -nep_nleigs_pc_type none -rg_interval_endpoints 0.2,1.1 -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_view -nep_error_relative ::ascii_info_detail

And also other solvers such as SLP:

$ ./test_nep -nep_type slp -nep_slp_ksp_type gmres -nep_slp_pc_type none -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_error_relative ::ascii_info_detail

I will clean the example code an add it as a SLEPc example.

Regards,
Jose


> El 11 oct 2023, a las 17:27, Kenneth C Hall <kenneth.c.hall at duke.edu> escribi?:
> 
> Jose,
>  
> Thanks very much for your help with this. Greatly appreciated. I will look at the MR. Please let me know if you do get the Fortran example working.
>  
> Thanks, and best regards,
> Kenneth
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_nep.F90
Type: application/octet-stream
Size: 8471 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231012/edbebfe7/attachment.obj>

From kenneth.c.hall at duke.edu  Thu Oct 12 13:59:34 2023
From: kenneth.c.hall at duke.edu (Kenneth C Hall)
Date: Thu, 12 Oct 2023 18:59:34 +0000
Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and
 T'(lambda)
In-Reply-To: <442B3841-B668-4185-9C6F-D03CA481CA26@dsic.upv.es>
References: <BL0PR05MB480177D5E10088FE9EC11DA4A2CAA@BL0PR05MB4801.namprd05.prod.outlook.com>
	<F1528F80-C92C-43A9-9CB7-75A1D8712100@dsic.upv.es>
	<BL0PR05MB4801BD698F33E63C93E55252A2C9A@BL0PR05MB4801.namprd05.prod.outlook.com>
	<89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es>
	<BL0PR05MB48010E7BE4124F38F73144AEA2CCA@BL0PR05MB4801.namprd05.prod.outlook.com>
	<442B3841-B668-4185-9C6F-D03CA481CA26@dsic.upv.es>
Message-ID: <BL0PR05MB4801499DCBCD2582091CCBE1A2D3A@BL0PR05MB4801.namprd05.prod.outlook.com>

Jose,

Thanks very much for this. I will give it a try and let you know how it works.

Best regards,
Kenneth

From: Jose E. Roman <jroman at dsic.upv.es>
Date: Thursday, October 12, 2023 at 2:12 PM
To: Kenneth C Hall <kenneth.c.hall at duke.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda)
I am attaching your example modified with the context stuff.

With the PETSc branch that I indicated, now it works with NLEIGS, for instance:

$ ./test_nep -nep_nleigs_ksp_type gmres -nep_nleigs_pc_type none -rg_interval_endpoints 0.2,1.1 -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_view -nep_error_relative ::ascii_info_detail

And also other solvers such as SLP:

$ ./test_nep -nep_type slp -nep_slp_ksp_type gmres -nep_slp_pc_type none -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_error_relative ::ascii_info_detail

I will clean the example code an add it as a SLEPc example.

Regards,
Jose


> El 11 oct 2023, a las 17:27, Kenneth C Hall <kenneth.c.hall at duke.edu> escribi?:
>
> Jose,
>
> Thanks very much for your help with this. Greatly appreciated. I will look at the MR. Please let me know if you do get the Fortran example working.
>
> Thanks, and best regards,
> Kenneth
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231012/0b3ca979/attachment.html>

From erdemguer at proton.me  Fri Oct 13 06:26:39 2023
From: erdemguer at proton.me (erdemguer)
Date: Fri, 13 Oct 2023 11:26:39 +0000
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4G=uZNiME8x1-6bYBDs2vL=NGr82Te6EQGjHyfqe2ee9zw@mail.gmail.com>
	<s87lUk8sbdi_CFN4eW2YBzTuL6lWDusURGA4YrxM6hfepdU9cU_4QBSLZzmc4u2TtelEoTSd-1oukho4kCl_qyhL_n1phV_poV6qyfLa5QM=@proton.me>
	<CAMYG4GmQjoWw9+dDNA=8=ROhQec-sgBd9pPfA0rVayEudTN97A@mail.gmail.com>
	<Guidds2-aFUR6yWpJSAgb5CkmFqkw9t21m5LK9BzCX7f6MV1Xc7qJNq0VGr3EVHReq7RcBxPfEzwv0dcs7xHkMgAOFaBw-Q-Og4x3Hi5kSA=@proton.me>
	<CAMYG4GkjFNCYEMgQ=Wtr0dEukyU2TOEfOp3nLbVRS5S+d-hrxA@mail.gmail.com>
	<4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me>
Message-ID: <gmllY1ELqGRwcQANpnebX5gEwAx_QA0GjWGCD94kVG3xEfUSM0EBeoVGYFaRs5V7U810UsQlK2ZQs65OkJm7tczGztSZQit9sDKl1EnAUCs=@proton.me>

Hi, unfortunately it's me again.

I have some weird troubles with creating matrix with DMPlex. Actually I might not need to create matrix explicitly, but SNESSolve crashes at there too. So, I updated the code you provided. When I tried to use DMCreateMatrix() at first, I got an error "Unknown discretization type for field 0" at first I applied DMSetLocalSection() and this error is gone. But this time when I run the code with multiple processors, sometimes I got an output like:
Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27
Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
[1] ghost cell 14
[1] ghost cell 15
[1] ghost cell 16
[1] ghost cell 17
[1] ghost cell 18
[1] ghost cell 19
[1] ghost cell 20
[1] ghost cell 21
[1] ghost cell 22
After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23
[0] ghost cell 13
[0] ghost cell 14
[0] ghost cell 15
[0] ghost cell 16
[0] ghost cell 17
[0] ghost cell 18
[0] ghost cell 19
[0] ghost cell 20
[0] ghost cell 21
[0] ghost cell 22
[0] ghost cell 23
After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24
Fatal error in internal_Waitall: Unknown error class, error stack:
internal_Waitall(82)......................: MPI_Waitall(count=1, array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed
MPIR_Waitall(1099)........................:
MPIR_Waitall_impl(1011)...................:
MPIR_Waitall_state(976)...................:
MPIDI_CH3i_Progress_wait(187).............: an error occurred while handling an event returned by MPIDI_CH3I_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(411):
ReadMoreData(744).........................: ch3|sock|immedread 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains invalid memory (set=0,sock=1,errno=14:Bad address)

Sometimes the error message isn't appearing but for example I'm trying to print size of the matrix but it isn't working.
If necessary, my Configure options --download-mpich --download-hwloc --download-pastix --download-hypre --download-ml --download-ctetgen --download-triangle --download-exodusii --download-netcdf --download-zlib --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" --with-debugging=1

Version: Petsc Release Version 3.20.0

Thank you,
Guer

Sent with [Proton Mail](https://proton.me/) secure email.

------- Original Message -------
On Thursday, October 12th, 2023 at 12:59 AM, erdemguer <erdemguer at proton.me> wrote:

> Thank you! That's exactly what I need.
>
> Sent with [Proton Mail](https://proton.me/) secure email.
>
> ------- Original Message -------
> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer <erdemguer at proton.me> wrote:
>>
>>> Hi again,
>>
>> I see the problem. FV ghosts mean extra boundary cells added in FV methods using DMPlexCreateGhostCells() in order to impose boundary conditions. They are not the "ghost" cells for overlapping parallel decompositions. I have changed your code to give you what you want. It is attached.
>>
>> Thanks,
>>
>> Matt
>>
>>> Here is my code:
>>> #include <petsc.h>
>>> static char help[] = "dmplex";
>>>
>>> int main(int argc, char **argv)
>>> {
>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help));
>>> DM dm, dm_dist;
>>> PetscSection section;
>>> PetscInt cStart, cEndInterior, cEnd, rank;
>>> PetscInt nc[3] = {3, 3, 3};
>>> PetscReal upper[3] = {1, 1, 1};
>>>
>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));
>>>
>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm);
>>> DMViewFromOptions(dm, NULL, "-dm1_view");
>>> PetscCall(DMSetFromOptions(dm));
>>> DMViewFromOptions(dm, NULL, "-dm2_view");
>>>
>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>> DMPlexComputeCellTypes(dm);
>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL));
>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>> cEndInterior, cEnd);
>>>
>>> PetscInt nField = 1, nDof = 3, field = 0;
>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, &section));
>>> PetscSectionSetNumFields(section, nField);
>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd));
>>> for (PetscInt p = cStart; p < cEnd; p++)
>>> {
>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof));
>>> PetscCall(PetscSectionSetDof(section, p, nDof));
>>> }
>>>
>>> PetscCall(PetscSectionSetUp(section));
>>>
>>> DMSetLocalSection(dm, section);
>>> DMViewFromOptions(dm, NULL, "-dm3_view");
>>>
>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE);
>>> DMViewFromOptions(dm, NULL, "-dm4_view");
>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist));
>>> if (dm_dist)
>>> {
>>> DMDestroy(&dm);
>>> dm = dm_dist;
>>> }
>>> DMViewFromOptions(dm, NULL, "-dm5_view");
>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>> DMPlexComputeCellTypes(dm);
>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL));
>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>> cEndInterior, cEnd);
>>>
>>> DMDestroy(&dm);
>>> PetscCall(PetscFinalize());}
>>>
>>> This codes output is currently (on 2 processors) is:
>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14
>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13
>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24
>>>
>>> DMView outputs:
>>> dm1_view (after creation):
>>> DM Object: 2 MPI processes
>>> type: plex
>>> DM_0x84000004_0 in 3 dimensions:
>>> Number of 0-cells per rank: 64 0
>>> Number of 1-cells per rank: 144 0
>>> Number of 2-cells per rank: 108 0
>>> Number of 3-cells per rank: 27 0
>>> Labels:
>>> marker: 1 strata with value/size (1 (218))
>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9))
>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144))
>>>
>>> dm2_view (after setfromoptions):
>>> DM Object: 2 MPI processes
>>> type: plex
>>> DM_0x84000004_0 in 3 dimensions:
>>> Number of 0-cells per rank: 40 46
>>> Number of 1-cells per rank: 83 95
>>> Number of 2-cells per rank: 57 64
>>> Number of 3-cells per rank: 13 14
>>> Labels:
>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>> marker: 1 strata with value/size (1 (109))
>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>
>>> dm3_view (after setting local section):
>>> DM Object: 2 MPI processes
>>> type: plex
>>> DM_0x84000004_0 in 3 dimensions:
>>> Number of 0-cells per rank: 40 46
>>> Number of 1-cells per rank: 83 95
>>> Number of 2-cells per rank: 57 64
>>> Number of 3-cells per rank: 13 14
>>> Labels:
>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>> marker: 1 strata with value/size (1 (109))
>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>> Field Field_0: adjacency FEM
>>>
>>> dm4_view (after setting adjacency):
>>> DM Object: 2 MPI processes
>>> type: plex
>>> DM_0x84000004_0 in 3 dimensions:
>>> Number of 0-cells per rank: 40 46
>>> Number of 1-cells per rank: 83 95
>>> Number of 2-cells per rank: 57 64
>>> Number of 3-cells per rank: 13 14
>>> Labels:
>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>> marker: 1 strata with value/size (1 (109))
>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>> Field Field_0: adjacency FVM++
>>>
>>> dm5_view (after distribution):
>>> DM Object: Parallel Mesh 2 MPI processes
>>> type: plex
>>> Parallel Mesh in 3 dimensions:
>>> Number of 0-cells per rank: 64 60
>>> Number of 1-cells per rank: 144 133
>>> Number of 2-cells per rank: 108 98
>>> Number of 3-cells per rank: 27 24
>>> Labels:
>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>>> marker: 1 strata with value/size (1 (218))
>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9))
>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27))
>>> Field Field_0: adjacency FVM++
>>>
>>> Thanks,
>>> Guer.
>>>
>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>
>>> ------- Original Message -------
>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>>
>>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:
>>>>
>>>>> Hi,
>>>>> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine:
>>>>>
>>>>> - DMPlexCreateBoxMesh
>>>>>
>>>>> - DMSetFromOptions
>>>>> - PetscSectionCreate
>>>>> - PetscSectionSetNumFields
>>>>> - PetscSectionSetFieldDof
>>>>>
>>>>> - PetscSectionSetDof
>>>>>
>>>>> - PetscSectionSetUp
>>>>> - DMSetLocalSection
>>>>> - DMSetAdjacency
>>>>> - DMPlexDistribute
>>>>>
>>>>> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells.
>>>>
>>>> Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put
>>>>
>>>> DMViewFromOptions(dm, NULL, "-dm1_view")
>>>>
>>>> with a different string after each call.
>>>>
>>>>> But I couldn't figure out how to decide where the ghost/processor boundary cells start.
>>>>
>>>> Please send the actual code because the above is not specific enough. For example, you will not have
>>>> "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity,
>>>> so each process gets a unique set.
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing.
>>>>>
>>>>> Thanks again,
>>>>> Guer.
>>>>>
>>>>> ------- Original Message -------
>>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>
>>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow:
>>>>>>>
>>>>>>> Create a DMPlex mesh from GMSH.
>>>>>>> Reorder it with DMPlexPermute.
>>>>>>> Create necessary pre-processing arrays related to the mesh/problem.
>>>>>>> Create field(s) with multi-dofs.
>>>>>>> Create residual vectors.
>>>>>>> Define a function to calculate the residual for each cell and, use SNES.
>>>>>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic?
>>>>>>
>>>>>> The intention was that there is enough information in the manual to do this.
>>>>>>
>>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand.
>>>>>>
>>>>>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>>> Thank you.
>>>>>>> Guer.
>>>>>>>
>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>>
>>>>>> --
>>>>>>
>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>>>
>>>> --
>>>>
>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>
>> --
>>
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>>
>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231013/58852418/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ex1.c
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231013/58852418/attachment-0001.c>

From knepley at gmail.com  Fri Oct 13 07:00:01 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 13 Oct 2023 08:00:01 -0400
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <gmllY1ELqGRwcQANpnebX5gEwAx_QA0GjWGCD94kVG3xEfUSM0EBeoVGYFaRs5V7U810UsQlK2ZQs65OkJm7tczGztSZQit9sDKl1EnAUCs=@proton.me>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4G=uZNiME8x1-6bYBDs2vL=NGr82Te6EQGjHyfqe2ee9zw@mail.gmail.com>
	<s87lUk8sbdi_CFN4eW2YBzTuL6lWDusURGA4YrxM6hfepdU9cU_4QBSLZzmc4u2TtelEoTSd-1oukho4kCl_qyhL_n1phV_poV6qyfLa5QM=@proton.me>
	<CAMYG4GmQjoWw9+dDNA=8=ROhQec-sgBd9pPfA0rVayEudTN97A@mail.gmail.com>
	<Guidds2-aFUR6yWpJSAgb5CkmFqkw9t21m5LK9BzCX7f6MV1Xc7qJNq0VGr3EVHReq7RcBxPfEzwv0dcs7xHkMgAOFaBw-Q-Og4x3Hi5kSA=@proton.me>
	<CAMYG4GkjFNCYEMgQ=Wtr0dEukyU2TOEfOp3nLbVRS5S+d-hrxA@mail.gmail.com>
	<4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me>
	<gmllY1ELqGRwcQANpnebX5gEwAx_QA0GjWGCD94kVG3xEfUSM0EBeoVGYFaRs5V7U810UsQlK2ZQs65OkJm7tczGztSZQit9sDKl1EnAUCs=@proton.me>
Message-ID: <CAMYG4GkK+mX9cM+GifGaL=sTUuruBX1226yFLiEwP5QZOTJLMw@mail.gmail.com>

On Fri, Oct 13, 2023 at 7:26?AM erdemguer <erdemguer at proton.me> wrote:

> Hi, unfortunately it's me again.
>
> I have some weird troubles with creating matrix with DMPlex. Actually I
> might not need to create matrix explicitly, but SNESSolve crashes at there
> too. So, I updated the code you provided. When I tried to use
> DMCreateMatrix() at first, I got an error "Unknown discretization type
> for field 0" at first I applied DMSetLocalSection() and this error is gone.
> But this time when I run the code with multiple processors, sometimes I got
> an output like:
>

Some setup was out of order so the section size on proc1 was 0, and I was
not good about checking this.
I have fixed it and attached.

  Thanks,

     Matt

Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27
> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
> [1] ghost cell 14
> [1] ghost cell 15
> [1] ghost cell 16
> [1] ghost cell 17
> [1] ghost cell 18
> [1] ghost cell 19
> [1] ghost cell 20
> [1] ghost cell 21
> [1] ghost cell 22
> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23
> [0] ghost cell 13
> [0] ghost cell 14
> [0] ghost cell 15
> [0] ghost cell 16
> [0] ghost cell 17
> [0] ghost cell 18
> [0] ghost cell 19
> [0] ghost cell 20
> [0] ghost cell 21
> [0] ghost cell 22
> [0] ghost cell 23
> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24
> Fatal error in internal_Waitall: Unknown error class, error stack:
> internal_Waitall(82)......................: MPI_Waitall(count=1,
> array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed
> MPIR_Waitall(1099)........................:
> MPIR_Waitall_impl(1011)...................:
> MPIR_Waitall_state(976)...................:
> MPIDI_CH3i_Progress_wait(187).............: an error occurred while
> handling an event returned by MPIDI_CH3I_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(411):
> ReadMoreData(744).........................: ch3|sock|immedread
> 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880
> MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains
> invalid memory (set=0,sock=1,errno=14:Bad address)
>
> Sometimes the error message isn't appearing but for example I'm trying to
> print size of the matrix but it isn't working.
> If necessary, my Configure options --download-mpich --download-hwloc
> --download-pastix --download-hypre --download-ml --download-ctetgen
> --download-triangle --download-exodusii --download-netcdf --download-zlib
> --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16
> --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g
> -O2" --with-debugging=1
>
> Version: Petsc Release Version 3.20.0
>
> Thank you,
> Guer
>
> Sent with Proton Mail <https://proton.me/> secure email.
>
> ------- Original Message -------
> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer <
> erdemguer at proton.me> wrote:
>
> Thank you! That's exactly what I need.
>
> Sent with Proton Mail <https://proton.me/> secure email.
>
> ------- Original Message -------
> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley <
> knepley at gmail.com> wrote:
>
> On Wed, Oct 11, 2023 at 4:42?AM erdemguer <erdemguer at proton.me> wrote:
>
>> Hi again,
>>
>
> I see the problem. FV ghosts mean extra boundary cells added in FV methods
> using DMPlexCreateGhostCells() in order to impose boundary conditions. They
> are not the "ghost" cells for overlapping parallel decompositions. I have
> changed your code to give you what you want. It is attached.
>
> Thanks,
>
> Matt
>
>> Here is my code:
>> #include <petsc.h>
>> static char help[] = "dmplex";
>>
>> int main(int argc, char **argv)
>> {
>> PetscCall(PetscInitialize(&argc, &argv, NULL, help));
>> DM dm, dm_dist;
>> PetscSection section;
>> PetscInt cStart, cEndInterior, cEnd, rank;
>> PetscInt nc[3] = {3, 3, 3};
>> PetscReal upper[3] = {1, 1, 1};
>>
>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));
>>
>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper,
>> NULL, PETSC_TRUE, &dm);
>> DMViewFromOptions(dm, NULL, "-dm1_view");
>> PetscCall(DMSetFromOptions(dm));
>> DMViewFromOptions(dm, NULL, "-dm2_view");
>>
>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>> DMPlexComputeCellTypes(dm);
>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST,
>> &cEndInterior, NULL));
>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d,
>> cEndInterior: %d, cEnd: %d\n", rank, cStart,
>> cEndInterior, cEnd);
>>
>> PetscInt nField = 1, nDof = 3, field = 0;
>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, &section));
>> PetscSectionSetNumFields(section, nField);
>> PetscCall(PetscSectionSetChart(section, cStart, cEnd));
>> for (PetscInt p = cStart; p < cEnd; p++)
>> {
>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof));
>> PetscCall(PetscSectionSetDof(section, p, nDof));
>> }
>>
>> PetscCall(PetscSectionSetUp(section));
>>
>> DMSetLocalSection(dm, section);
>> DMViewFromOptions(dm, NULL, "-dm3_view");
>>
>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE);
>> DMViewFromOptions(dm, NULL, "-dm4_view");
>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist));
>> if (dm_dist)
>> {
>> DMDestroy(&dm);
>> dm = dm_dist;
>> }
>> DMViewFromOptions(dm, NULL, "-dm5_view");
>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>> DMPlexComputeCellTypes(dm);
>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST,
>> &cEndInterior, NULL));
>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d,
>> cEndInterior: %d, cEnd: %d\n", rank, cStart,
>> cEndInterior, cEnd);
>>
>> DMDestroy(&dm);
>> PetscCall(PetscFinalize());
>> }
>>
>> This codes output is currently (on 2 processors) is:
>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14
>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13
>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27
>> After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24
>>
>> DMView outputs:
>> dm1_view (after creation):
>> DM Object: 2 MPI processes
>> type: plex
>> DM_0x84000004_0 in 3 dimensions:
>> Number of 0-cells per rank: 64 0
>> Number of 1-cells per rank: 144 0
>> Number of 2-cells per rank: 108 0
>> Number of 3-cells per rank: 27 0
>> Labels:
>> marker: 1 strata with value/size (1 (218))
>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2
>> (9))
>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>> celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144))
>>
>> dm2_view (after setfromoptions):
>> DM Object: 2 MPI processes
>> type: plex
>> DM_0x84000004_0 in 3 dimensions:
>> Number of 0-cells per rank: 40 46
>> Number of 1-cells per rank: 83 95
>> Number of 2-cells per rank: 57 64
>> Number of 3-cells per rank: 13 14
>> Labels:
>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>> marker: 1 strata with value/size (1 (109))
>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>
>> dm3_view (after setting local section):
>> DM Object: 2 MPI processes
>> type: plex
>> DM_0x84000004_0 in 3 dimensions:
>> Number of 0-cells per rank: 40 46
>> Number of 1-cells per rank: 83 95
>> Number of 2-cells per rank: 57 64
>> Number of 3-cells per rank: 13 14
>> Labels:
>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>> marker: 1 strata with value/size (1 (109))
>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>> Field Field_0:
>> adjacency FEM
>>
>> dm4_view (after setting adjacency):
>> DM Object: 2 MPI processes
>> type: plex
>> DM_0x84000004_0 in 3 dimensions:
>> Number of 0-cells per rank: 40 46
>> Number of 1-cells per rank: 83 95
>> Number of 2-cells per rank: 57 64
>> Number of 3-cells per rank: 13 14
>> Labels:
>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>> marker: 1 strata with value/size (1 (109))
>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>> Field Field_0:
>> adjacency FVM++
>>
>> dm5_view (after distribution):
>> DM Object: Parallel Mesh 2 MPI processes
>> type: plex
>> Parallel Mesh in 3 dimensions:
>> Number of 0-cells per rank: 64 60
>> Number of 1-cells per rank: 144 133
>> Number of 2-cells per rank: 108 98
>> Number of 3-cells per rank: 27 24
>> Labels:
>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>> marker: 1 strata with value/size (1 (218))
>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6
>> (9))
>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27))
>> Field Field_0:
>> adjacency FVM++
>>
>> Thanks,
>> Guer.
>> Sent with Proton Mail <https://proton.me/> secure email.
>>
>> ------- Original Message -------
>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley <
>> knepley at gmail.com> wrote:
>>
>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:
>>
>>>
>>> Hi,
>>> Sorry for my late response. I tried with your suggestions and I think I
>>> made a progress. But I still got issues. Let me explain my latest mesh
>>> routine:
>>>
>>>
>>>    1. DMPlexCreateBoxMesh
>>>    2. DMSetFromOptions
>>>    3. PetscSectionCreate
>>>    4. PetscSectionSetNumFields
>>>    5. PetscSectionSetFieldDof
>>>    6. PetscSectionSetDof
>>>    7. PetscSectionSetUp
>>>    8. DMSetLocalSection
>>>    9. DMSetAdjacency
>>>    10. DMPlexDistribute
>>>
>>>
>>> It's still not working but it's promising, if I call
>>> DMPlexGetDepthStratum for cells, I can see that after distribution
>>> processors have more cells.
>>>
>>
>> Please send the output of DMPlexView() for each incarnation of the mesh.
>> What I do is put
>>
>> DMViewFromOptions(dm, NULL, "-dm1_view")
>>
>>
>> with a different string after each call.
>>
>>> But I couldn't figure out how to decide where the ghost/processor
>>> boundary cells start.
>>>
>>
>> Please send the actual code because the above is not specific enough. For
>> example, you will not have
>> "ghost cells" unless you partition with overlap. This is because by
>> default cells are the partitioned quantity,
>> so each process gets a unique set.
>>
>> Thanks,
>>
>> Matt
>>
>>> In older mails I saw there is a function DMPlexGetHybridBounds but I
>>> think that function is deprecated. I tried to use,
>>> DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm getting
>>> -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST,
>>> DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling
>>> DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed.
>>> I think I can calculate the ghost cell indices using cStart/cEnd before &
>>> after distribution but I think there is a better way I'm currently missing.
>>>
>>> Thanks again,
>>> Guer.
>>>
>>> ------- Original Message -------
>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <
>>> knepley at gmail.com> wrote:
>>>
>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am currently using DMPlex in my code. It runs serially at the moment,
>>>> but I'm interested in adding parallel options. Here is my workflow:
>>>>
>>>> Create a DMPlex mesh from GMSH.
>>>> Reorder it with DMPlexPermute.
>>>> Create necessary pre-processing arrays related to the mesh/problem.
>>>> Create field(s) with multi-dofs.
>>>> Create residual vectors.
>>>> Define a function to calculate the residual for each cell and, use SNES.
>>>> As you can see, I'm not using FV or FE structures (most examples do).
>>>> Now, I'm trying to implement this in parallel using a similar approach.
>>>> However, I'm struggling to understand how to create corresponding vectors
>>>> and how to obtain index sets for each processor. Is there a tutorial or
>>>> paper that covers this topic?
>>>>
>>>
>>> The intention was that there is enough information in the manual to do
>>> this.
>>>
>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you
>>> to use PetscSection. Without this, it would be incredibly hard to do what
>>> you want. Once the DM has a Section, it can do things like automatically
>>> create vectors and matrices for you. It can redistribute them, subset them,
>>> etc. The Section describes how dofs are assigned to pieces of the mesh
>>> (mesh points). This is in the manual, and there are a few examples that do
>>> it by hand.
>>>
>>> So I suggest changing your code to use PetscSection, and then letting us
>>> know if things still do not work.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>> Thank you.
>>>> Guer.
>>>>
>>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231013/dc82fc2a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex1.c
Type: application/octet-stream
Size: 3127 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231013/dc82fc2a/attachment-0001.obj>

From erdemguer at proton.me  Mon Oct 16 05:54:17 2023
From: erdemguer at proton.me (erdemguer)
Date: Mon, 16 Oct 2023 10:54:17 +0000
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <CAMYG4GkK+mX9cM+GifGaL=sTUuruBX1226yFLiEwP5QZOTJLMw@mail.gmail.com>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4G=uZNiME8x1-6bYBDs2vL=NGr82Te6EQGjHyfqe2ee9zw@mail.gmail.com>
	<s87lUk8sbdi_CFN4eW2YBzTuL6lWDusURGA4YrxM6hfepdU9cU_4QBSLZzmc4u2TtelEoTSd-1oukho4kCl_qyhL_n1phV_poV6qyfLa5QM=@proton.me>
	<CAMYG4GmQjoWw9+dDNA=8=ROhQec-sgBd9pPfA0rVayEudTN97A@mail.gmail.com>
	<Guidds2-aFUR6yWpJSAgb5CkmFqkw9t21m5LK9BzCX7f6MV1Xc7qJNq0VGr3EVHReq7RcBxPfEzwv0dcs7xHkMgAOFaBw-Q-Og4x3Hi5kSA=@proton.me>
	<CAMYG4GkjFNCYEMgQ=Wtr0dEukyU2TOEfOp3nLbVRS5S+d-hrxA@mail.gmail.com>
	<4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me>
	<gmllY1ELqGRwcQANpnebX5gEwAx_QA0GjWGCD94kVG3xEfUSM0EBeoVGYFaRs5V7U810UsQlK2ZQs65OkJm7tczGztSZQit9sDKl1EnAUCs=@proton.me>
	<CAMYG4GkK+mX9cM+GifGaL=sTUuruBX1226yFLiEwP5QZOTJLMw@mail.gmail.com>
Message-ID: <sXKYxIwT-inGkXCbmhk2k6s6R_jkAUSgaHZOFKkUpRGW96BxDM5knvormu5bE3L4SoJPtm711B7BEFLNkBfjF5eX0FQWoMWbL_4M0KRHbTs=@proton.me>

Hey again.

This code outputs for example:

After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24
After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27
[0] m: 39 n: 39[1] m: 42 n: 42

Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on processor boundaries?

P.S. It looks like I should use PetscFV or something like that at the first place. At first I thought, "I will just use SNES, I will compute only residual and jacobian on cells so why do bother with PetscFV?" So

Thanks,
E.

Sent with [Proton Mail](https://proton.me/) secure email.

------- Original Message -------
On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Oct 13, 2023 at 7:26?AM erdemguer <erdemguer at proton.me> wrote:
>
>> Hi, unfortunately it's me again.
>>
>> I have some weird troubles with creating matrix with DMPlex. Actually I might not need to create matrix explicitly, but SNESSolve crashes at there too. So, I updated the code you provided. When I tried to use DMCreateMatrix() at first, I got an error "Unknown discretization type for field 0" at first I applied DMSetLocalSection() and this error is gone. But this time when I run the code with multiple processors, sometimes I got an output like:
>
> Some setup was out of order so the section size on proc1 was 0, and I was not good about checking this.
> I have fixed it and attached.
>
> Thanks,
>
> Matt
>
>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27
>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
>> [1] ghost cell 14
>> [1] ghost cell 15
>> [1] ghost cell 16
>> [1] ghost cell 17
>> [1] ghost cell 18
>> [1] ghost cell 19
>> [1] ghost cell 20
>> [1] ghost cell 21
>> [1] ghost cell 22
>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23
>> [0] ghost cell 13
>> [0] ghost cell 14
>> [0] ghost cell 15
>> [0] ghost cell 16
>> [0] ghost cell 17
>> [0] ghost cell 18
>> [0] ghost cell 19
>> [0] ghost cell 20
>> [0] ghost cell 21
>> [0] ghost cell 22
>> [0] ghost cell 23
>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24
>> Fatal error in internal_Waitall: Unknown error class, error stack:
>> internal_Waitall(82)......................: MPI_Waitall(count=1, array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed
>> MPIR_Waitall(1099)........................:
>> MPIR_Waitall_impl(1011)...................:
>> MPIR_Waitall_state(976)...................:
>> MPIDI_CH3i_Progress_wait(187).............: an error occurred while handling an event returned by MPIDI_CH3I_Sock_Wait()
>> MPIDI_CH3I_Progress_handle_sock_event(411):
>> ReadMoreData(744).........................: ch3|sock|immedread 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains invalid memory (set=0,sock=1,errno=14:Bad address)
>>
>> Sometimes the error message isn't appearing but for example I'm trying to print size of the matrix but it isn't working.
>> If necessary, my Configure options --download-mpich --download-hwloc --download-pastix --download-hypre --download-ml --download-ctetgen --download-triangle --download-exodusii --download-netcdf --download-zlib --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" --with-debugging=1
>>
>> Version: Petsc Release Version 3.20.0
>>
>> Thank you,
>> Guer
>>
>> Sent with [Proton Mail](https://proton.me/) secure email.
>>
>> ------- Original Message -------
>> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer <erdemguer at proton.me> wrote:
>>
>>> Thank you! That's exactly what I need.
>>>
>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>
>>> ------- Original Message -------
>>> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>
>>>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer <erdemguer at proton.me> wrote:
>>>>
>>>>> Hi again,
>>>>
>>>> I see the problem. FV ghosts mean extra boundary cells added in FV methods using DMPlexCreateGhostCells() in order to impose boundary conditions. They are not the "ghost" cells for overlapping parallel decompositions. I have changed your code to give you what you want. It is attached.
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>>> Here is my code:
>>>>> #include <petsc.h>
>>>>> static char help[] = "dmplex";
>>>>>
>>>>> int main(int argc, char **argv)
>>>>> {
>>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help));
>>>>> DM dm, dm_dist;
>>>>> PetscSection section;
>>>>> PetscInt cStart, cEndInterior, cEnd, rank;
>>>>> PetscInt nc[3] = {3, 3, 3};
>>>>> PetscReal upper[3] = {1, 1, 1};
>>>>>
>>>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));
>>>>>
>>>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm);
>>>>> DMViewFromOptions(dm, NULL, "-dm1_view");
>>>>> PetscCall(DMSetFromOptions(dm));
>>>>> DMViewFromOptions(dm, NULL, "-dm2_view");
>>>>>
>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>>>> DMPlexComputeCellTypes(dm);
>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL));
>>>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>>>> cEndInterior, cEnd);
>>>>>
>>>>> PetscInt nField = 1, nDof = 3, field = 0;
>>>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, &section));
>>>>> PetscSectionSetNumFields(section, nField);
>>>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd));
>>>>> for (PetscInt p = cStart; p < cEnd; p++)
>>>>> {
>>>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof));
>>>>> PetscCall(PetscSectionSetDof(section, p, nDof));
>>>>> }
>>>>>
>>>>> PetscCall(PetscSectionSetUp(section));
>>>>>
>>>>> DMSetLocalSection(dm, section);
>>>>> DMViewFromOptions(dm, NULL, "-dm3_view");
>>>>>
>>>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE);
>>>>> DMViewFromOptions(dm, NULL, "-dm4_view");
>>>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist));
>>>>> if (dm_dist)
>>>>> {
>>>>> DMDestroy(&dm);
>>>>> dm = dm_dist;
>>>>> }
>>>>> DMViewFromOptions(dm, NULL, "-dm5_view");
>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>>>> DMPlexComputeCellTypes(dm);
>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL));
>>>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>>>> cEndInterior, cEnd);
>>>>>
>>>>> DMDestroy(&dm);
>>>>> PetscCall(PetscFinalize());}
>>>>>
>>>>> This codes output is currently (on 2 processors) is:
>>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14
>>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13
>>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24
>>>>>
>>>>> DMView outputs:
>>>>> dm1_view (after creation):
>>>>> DM Object: 2 MPI processes
>>>>> type: plex
>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>> Number of 0-cells per rank: 64 0
>>>>> Number of 1-cells per rank: 144 0
>>>>> Number of 2-cells per rank: 108 0
>>>>> Number of 3-cells per rank: 27 0
>>>>> Labels:
>>>>> marker: 1 strata with value/size (1 (218))
>>>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9))
>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144))
>>>>>
>>>>> dm2_view (after setfromoptions):
>>>>> DM Object: 2 MPI processes
>>>>> type: plex
>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>> Number of 0-cells per rank: 40 46
>>>>> Number of 1-cells per rank: 83 95
>>>>> Number of 2-cells per rank: 57 64
>>>>> Number of 3-cells per rank: 13 14
>>>>> Labels:
>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>> marker: 1 strata with value/size (1 (109))
>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>>
>>>>> dm3_view (after setting local section):
>>>>> DM Object: 2 MPI processes
>>>>> type: plex
>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>> Number of 0-cells per rank: 40 46
>>>>> Number of 1-cells per rank: 83 95
>>>>> Number of 2-cells per rank: 57 64
>>>>> Number of 3-cells per rank: 13 14
>>>>> Labels:
>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>> marker: 1 strata with value/size (1 (109))
>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>> Field Field_0: adjacency FEM
>>>>>
>>>>> dm4_view (after setting adjacency):
>>>>> DM Object: 2 MPI processes
>>>>> type: plex
>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>> Number of 0-cells per rank: 40 46
>>>>> Number of 1-cells per rank: 83 95
>>>>> Number of 2-cells per rank: 57 64
>>>>> Number of 3-cells per rank: 13 14
>>>>> Labels:
>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>> marker: 1 strata with value/size (1 (109))
>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>> Field Field_0: adjacency FVM++
>>>>>
>>>>> dm5_view (after distribution):
>>>>> DM Object: Parallel Mesh 2 MPI processes
>>>>> type: plex
>>>>> Parallel Mesh in 3 dimensions:
>>>>> Number of 0-cells per rank: 64 60
>>>>> Number of 1-cells per rank: 144 133
>>>>> Number of 2-cells per rank: 108 98
>>>>> Number of 3-cells per rank: 27 24
>>>>> Labels:
>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>>>>> marker: 1 strata with value/size (1 (218))
>>>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9))
>>>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27))
>>>>> Field Field_0: adjacency FVM++
>>>>>
>>>>> Thanks,
>>>>> Guer.
>>>>>
>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>
>>>>> ------- Original Message -------
>>>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>
>>>>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine:
>>>>>>>
>>>>>>> - DMPlexCreateBoxMesh
>>>>>>>
>>>>>>> - DMSetFromOptions
>>>>>>> - PetscSectionCreate
>>>>>>> - PetscSectionSetNumFields
>>>>>>> - PetscSectionSetFieldDof
>>>>>>>
>>>>>>> - PetscSectionSetDof
>>>>>>>
>>>>>>> - PetscSectionSetUp
>>>>>>> - DMSetLocalSection
>>>>>>> - DMSetAdjacency
>>>>>>> - DMPlexDistribute
>>>>>>>
>>>>>>> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells.
>>>>>>
>>>>>> Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put
>>>>>>
>>>>>> DMViewFromOptions(dm, NULL, "-dm1_view")
>>>>>>
>>>>>> with a different string after each call.
>>>>>>
>>>>>>> But I couldn't figure out how to decide where the ghost/processor boundary cells start.
>>>>>>
>>>>>> Please send the actual code because the above is not specific enough. For example, you will not have
>>>>>> "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity,
>>>>>> so each process gets a unique set.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing.
>>>>>>>
>>>>>>> Thanks again,
>>>>>>> Guer.
>>>>>>>
>>>>>>> ------- Original Message -------
>>>>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>>
>>>>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow:
>>>>>>>>>
>>>>>>>>> Create a DMPlex mesh from GMSH.
>>>>>>>>> Reorder it with DMPlexPermute.
>>>>>>>>> Create necessary pre-processing arrays related to the mesh/problem.
>>>>>>>>> Create field(s) with multi-dofs.
>>>>>>>>> Create residual vectors.
>>>>>>>>> Define a function to calculate the residual for each cell and, use SNES.
>>>>>>>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic?
>>>>>>>>
>>>>>>>> The intention was that there is enough information in the manual to do this.
>>>>>>>>
>>>>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand.
>>>>>>>>
>>>>>>>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>> Guer.
>>>>>>>>>
>>>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>>>>>
>>>>>> --
>>>>>>
>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>>>
>>>> --
>>>>
>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>
> --
>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/9cb8ae02/attachment-0001.html>

From knepley at gmail.com  Mon Oct 16 08:11:58 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 16 Oct 2023 09:11:58 -0400
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <sXKYxIwT-inGkXCbmhk2k6s6R_jkAUSgaHZOFKkUpRGW96BxDM5knvormu5bE3L4SoJPtm711B7BEFLNkBfjF5eX0FQWoMWbL_4M0KRHbTs=@proton.me>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4G=uZNiME8x1-6bYBDs2vL=NGr82Te6EQGjHyfqe2ee9zw@mail.gmail.com>
	<s87lUk8sbdi_CFN4eW2YBzTuL6lWDusURGA4YrxM6hfepdU9cU_4QBSLZzmc4u2TtelEoTSd-1oukho4kCl_qyhL_n1phV_poV6qyfLa5QM=@proton.me>
	<CAMYG4GmQjoWw9+dDNA=8=ROhQec-sgBd9pPfA0rVayEudTN97A@mail.gmail.com>
	<Guidds2-aFUR6yWpJSAgb5CkmFqkw9t21m5LK9BzCX7f6MV1Xc7qJNq0VGr3EVHReq7RcBxPfEzwv0dcs7xHkMgAOFaBw-Q-Og4x3Hi5kSA=@proton.me>
	<CAMYG4GkjFNCYEMgQ=Wtr0dEukyU2TOEfOp3nLbVRS5S+d-hrxA@mail.gmail.com>
	<4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me>
	<gmllY1ELqGRwcQANpnebX5gEwAx_QA0GjWGCD94kVG3xEfUSM0EBeoVGYFaRs5V7U810UsQlK2ZQs65OkJm7tczGztSZQit9sDKl1EnAUCs=@proton.me>
	<CAMYG4GkK+mX9cM+GifGaL=sTUuruBX1226yFLiEwP5QZOTJLMw@mail.gmail.com>
	<sXKYxIwT-inGkXCbmhk2k6s6R_jkAUSgaHZOFKkUpRGW96BxDM5knvormu5bE3L4SoJPtm711B7BEFLNkBfjF5eX0FQWoMWbL_4M0KRHbTs=@proton.me>
Message-ID: <CAMYG4G=LA-cGM-mjRSn6T8juZZtKVhwaZbh=aPbh5nHXc=iFyg@mail.gmail.com>

On Mon, Oct 16, 2023 at 6:54?AM erdemguer <erdemguer at proton.me> wrote:

> Hey again.
>
> This code outputs for example:
>
> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24
> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27
> [0] m: 39 n: 39
> [1] m: 42 n: 42
>
> Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on
> processor boundaries?
>

Here is my output

master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1
-malloc_debug 0 -dm_refine 1
Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
Before Distribution Rank: 0, cStart: 0, cEndInterior: 32, cEnd: 32
After Distribution Rank: 1, cStart: 0, cEndInterior: 16, cEnd: 24
After Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 24
[0] m: 48 n: 48
[1] m: 48 n: 48

The mesh is 4x4 and also split into two triangles, so 32 triangles. Then we
split it and have 8 overlap cells on each side. You can get quads using

master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1
-malloc_debug 0 -dm_plex_simplex 0 -dm_refine 1 -dm_view
Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
Before Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 16
After Distribution Rank: 1, cStart: 0, cEndInterior: 8, cEnd: 12
After Distribution Rank: 0, cStart: 0, cEndInterior: 8, cEnd: 12
[0] m: 24 n: 24
[1] m: 24 n: 24

It is the same 4x4 mesh, but now with quads.

  Thanks,

     Matt

P.S. It looks like I should use PetscFV or something like that at the first
> place. At first I thought, "I will just use SNES, I will compute only
> residual and jacobian on cells so why do bother with PetscFV?" So
>
> Thanks,
> E.
> Sent with Proton Mail <https://proton.me/> secure email.
>
> ------- Original Message -------
> On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley <
> knepley at gmail.com> wrote:
>
> On Fri, Oct 13, 2023 at 7:26?AM erdemguer <erdemguer at proton.me> wrote:
>
>> Hi, unfortunately it's me again.
>>
>> I have some weird troubles with creating matrix with DMPlex. Actually I
>> might not need to create matrix explicitly, but SNESSolve crashes at there
>> too. So, I updated the code you provided. When I tried to use
>> DMCreateMatrix() at first, I got an error "Unknown discretization type
>> for field 0" at first I applied DMSetLocalSection() and this error is gone.
>> But this time when I run the code with multiple processors, sometimes I got
>> an output like:
>>
>
> Some setup was out of order so the section size on proc1 was 0, and I was
> not good about checking this.
> I have fixed it and attached.
>
> Thanks,
>
> Matt
>
> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27
>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
>> [1] ghost cell 14
>> [1] ghost cell 15
>> [1] ghost cell 16
>> [1] ghost cell 17
>> [1] ghost cell 18
>> [1] ghost cell 19
>> [1] ghost cell 20
>> [1] ghost cell 21
>> [1] ghost cell 22
>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23
>> [0] ghost cell 13
>> [0] ghost cell 14
>> [0] ghost cell 15
>> [0] ghost cell 16
>> [0] ghost cell 17
>> [0] ghost cell 18
>> [0] ghost cell 19
>> [0] ghost cell 20
>> [0] ghost cell 21
>> [0] ghost cell 22
>> [0] ghost cell 23
>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24
>> Fatal error in internal_Waitall: Unknown error class, error stack:
>> internal_Waitall(82)......................: MPI_Waitall(count=1,
>> array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed
>> MPIR_Waitall(1099)........................:
>> MPIR_Waitall_impl(1011)...................:
>> MPIR_Waitall_state(976)...................:
>> MPIDI_CH3i_Progress_wait(187).............: an error occurred while
>> handling an event returned by MPIDI_CH3I_Sock_Wait()
>> MPIDI_CH3I_Progress_handle_sock_event(411):
>> ReadMoreData(744).........................: ch3|sock|immedread
>> 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880
>> MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains
>> invalid memory (set=0,sock=1,errno=14:Bad address)
>>
>> Sometimes the error message isn't appearing but for example I'm trying to
>> print size of the matrix but it isn't working.
>> If necessary, my Configure options --download-mpich --download-hwloc
>> --download-pastix --download-hypre --download-ml --download-ctetgen
>> --download-triangle --download-exodusii --download-netcdf --download-zlib
>> --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16
>> --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g
>> -O2" --with-debugging=1
>>
>> Version: Petsc Release Version 3.20.0
>>
>> Thank you,
>> Guer
>>
>> Sent with Proton Mail <https://proton.me/> secure email.
>>
>> ------- Original Message -------
>> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer <
>> erdemguer at proton.me> wrote:
>>
>> Thank you! That's exactly what I need.
>>
>> Sent with Proton Mail <https://proton.me/> secure email.
>>
>> ------- Original Message -------
>> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley <
>> knepley at gmail.com> wrote:
>>
>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer <erdemguer at proton.me> wrote:
>>
>>> Hi again,
>>>
>>
>> I see the problem. FV ghosts mean extra boundary cells added in FV
>> methods using DMPlexCreateGhostCells() in order to impose boundary
>> conditions. They are not the "ghost" cells for overlapping parallel
>> decompositions. I have changed your code to give you what you want. It is
>> attached.
>>
>> Thanks,
>>
>> Matt
>>
>>> Here is my code:
>>> #include <petsc.h>
>>> static char help[] = "dmplex";
>>>
>>> int main(int argc, char **argv)
>>> {
>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help));
>>> DM dm, dm_dist;
>>> PetscSection section;
>>> PetscInt cStart, cEndInterior, cEnd, rank;
>>> PetscInt nc[3] = {3, 3, 3};
>>> PetscReal upper[3] = {1, 1, 1};
>>>
>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));
>>>
>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper,
>>> NULL, PETSC_TRUE, &dm);
>>> DMViewFromOptions(dm, NULL, "-dm1_view");
>>> PetscCall(DMSetFromOptions(dm));
>>> DMViewFromOptions(dm, NULL, "-dm2_view");
>>>
>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>> DMPlexComputeCellTypes(dm);
>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST,
>>> &cEndInterior, NULL));
>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d,
>>> cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>> cEndInterior, cEnd);
>>>
>>> PetscInt nField = 1, nDof = 3, field = 0;
>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, &section));
>>> PetscSectionSetNumFields(section, nField);
>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd));
>>> for (PetscInt p = cStart; p < cEnd; p++)
>>> {
>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof));
>>> PetscCall(PetscSectionSetDof(section, p, nDof));
>>> }
>>>
>>> PetscCall(PetscSectionSetUp(section));
>>>
>>> DMSetLocalSection(dm, section);
>>> DMViewFromOptions(dm, NULL, "-dm3_view");
>>>
>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE);
>>> DMViewFromOptions(dm, NULL, "-dm4_view");
>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist));
>>> if (dm_dist)
>>> {
>>> DMDestroy(&dm);
>>> dm = dm_dist;
>>> }
>>> DMViewFromOptions(dm, NULL, "-dm5_view");
>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>> DMPlexComputeCellTypes(dm);
>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST,
>>> &cEndInterior, NULL));
>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d,
>>> cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>> cEndInterior, cEnd);
>>>
>>> DMDestroy(&dm);
>>> PetscCall(PetscFinalize());
>>> }
>>>
>>> This codes output is currently (on 2 processors) is:
>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14
>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13
>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27
>>> After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24
>>>
>>> DMView outputs:
>>> dm1_view (after creation):
>>> DM Object: 2 MPI processes
>>> type: plex
>>> DM_0x84000004_0 in 3 dimensions:
>>> Number of 0-cells per rank: 64 0
>>> Number of 1-cells per rank: 144 0
>>> Number of 2-cells per rank: 108 0
>>> Number of 3-cells per rank: 27 0
>>> Labels:
>>> marker: 1 strata with value/size (1 (218))
>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9),
>>> 2 (9))
>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>>> celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144))
>>>
>>> dm2_view (after setfromoptions):
>>> DM Object: 2 MPI processes
>>> type: plex
>>> DM_0x84000004_0 in 3 dimensions:
>>> Number of 0-cells per rank: 40 46
>>> Number of 1-cells per rank: 83 95
>>> Number of 2-cells per rank: 57 64
>>> Number of 3-cells per rank: 13 14
>>> Labels:
>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>> marker: 1 strata with value/size (1 (109))
>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>
>>> dm3_view (after setting local section):
>>> DM Object: 2 MPI processes
>>> type: plex
>>> DM_0x84000004_0 in 3 dimensions:
>>> Number of 0-cells per rank: 40 46
>>> Number of 1-cells per rank: 83 95
>>> Number of 2-cells per rank: 57 64
>>> Number of 3-cells per rank: 13 14
>>> Labels:
>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>> marker: 1 strata with value/size (1 (109))
>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>> Field Field_0:
>>> adjacency FEM
>>>
>>> dm4_view (after setting adjacency):
>>> DM Object: 2 MPI processes
>>> type: plex
>>> DM_0x84000004_0 in 3 dimensions:
>>> Number of 0-cells per rank: 40 46
>>> Number of 1-cells per rank: 83 95
>>> Number of 2-cells per rank: 57 64
>>> Number of 3-cells per rank: 13 14
>>> Labels:
>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>> marker: 1 strata with value/size (1 (109))
>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>> Field Field_0:
>>> adjacency FVM++
>>>
>>> dm5_view (after distribution):
>>> DM Object: Parallel Mesh 2 MPI processes
>>> type: plex
>>> Parallel Mesh in 3 dimensions:
>>> Number of 0-cells per rank: 64 60
>>> Number of 1-cells per rank: 144 133
>>> Number of 2-cells per rank: 108 98
>>> Number of 3-cells per rank: 27 24
>>> Labels:
>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>>> marker: 1 strata with value/size (1 (218))
>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9),
>>> 6 (9))
>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27))
>>> Field Field_0:
>>> adjacency FVM++
>>>
>>> Thanks,
>>> Guer.
>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>
>>> ------- Original Message -------
>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley <
>>> knepley at gmail.com> wrote:
>>>
>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:
>>>
>>>>
>>>> Hi,
>>>> Sorry for my late response. I tried with your suggestions and I think I
>>>> made a progress. But I still got issues. Let me explain my latest mesh
>>>> routine:
>>>>
>>>>
>>>>    1. DMPlexCreateBoxMesh
>>>>    2. DMSetFromOptions
>>>>    3. PetscSectionCreate
>>>>    4. PetscSectionSetNumFields
>>>>    5. PetscSectionSetFieldDof
>>>>    6. PetscSectionSetDof
>>>>    7. PetscSectionSetUp
>>>>    8. DMSetLocalSection
>>>>    9. DMSetAdjacency
>>>>    10. DMPlexDistribute
>>>>
>>>>
>>>> It's still not working but it's promising, if I call
>>>> DMPlexGetDepthStratum for cells, I can see that after distribution
>>>> processors have more cells.
>>>>
>>>
>>> Please send the output of DMPlexView() for each incarnation of the mesh.
>>> What I do is put
>>>
>>> DMViewFromOptions(dm, NULL, "-dm1_view")
>>>
>>>
>>> with a different string after each call.
>>>
>>>> But I couldn't figure out how to decide where the ghost/processor
>>>> boundary cells start.
>>>>
>>>
>>> Please send the actual code because the above is not specific enough.
>>> For example, you will not have
>>> "ghost cells" unless you partition with overlap. This is because by
>>> default cells are the partitioned quantity,
>>> so each process gets a unique set.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I
>>>> think that function is deprecated. I tried to use,
>>>> DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm getting
>>>> -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST,
>>>> DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling
>>>> DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed.
>>>> I think I can calculate the ghost cell indices using cStart/cEnd before &
>>>> after distribution but I think there is a better way I'm currently missing.
>>>>
>>>> Thanks again,
>>>> Guer.
>>>>
>>>> ------- Original Message -------
>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <
>>>> knepley at gmail.com> wrote:
>>>>
>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <
>>>> petsc-users at mcs.anl.gov> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am currently using DMPlex in my code. It runs serially at the
>>>>> moment, but I'm interested in adding parallel options. Here is my workflow:
>>>>>
>>>>> Create a DMPlex mesh from GMSH.
>>>>> Reorder it with DMPlexPermute.
>>>>> Create necessary pre-processing arrays related to the mesh/problem.
>>>>> Create field(s) with multi-dofs.
>>>>> Create residual vectors.
>>>>> Define a function to calculate the residual for each cell and, use
>>>>> SNES.
>>>>> As you can see, I'm not using FV or FE structures (most examples do).
>>>>> Now, I'm trying to implement this in parallel using a similar approach.
>>>>> However, I'm struggling to understand how to create corresponding vectors
>>>>> and how to obtain index sets for each processor. Is there a tutorial or
>>>>> paper that covers this topic?
>>>>>
>>>>
>>>> The intention was that there is enough information in the manual to do
>>>> this.
>>>>
>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage
>>>> you to use PetscSection. Without this, it would be incredibly hard to do
>>>> what you want. Once the DM has a Section, it can do things like
>>>> automatically create vectors and matrices for you. It can redistribute
>>>> them, subset them, etc. The Section describes how dofs are assigned to
>>>> pieces of the mesh (mesh points). This is in the manual, and there are a
>>>> few examples that do it by hand.
>>>>
>>>> So I suggest changing your code to use PetscSection, and then letting
>>>> us know if things still do not work.
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>>> Thank you.
>>>>> Guer.
>>>>>
>>>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/7953816e/attachment-0001.html>

From erdemguer at proton.me  Mon Oct 16 08:22:43 2023
From: erdemguer at proton.me (erdemguer)
Date: Mon, 16 Oct 2023 13:22:43 +0000
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <CAMYG4G=LA-cGM-mjRSn6T8juZZtKVhwaZbh=aPbh5nHXc=iFyg@mail.gmail.com>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4GmQjoWw9+dDNA=8=ROhQec-sgBd9pPfA0rVayEudTN97A@mail.gmail.com>
	<Guidds2-aFUR6yWpJSAgb5CkmFqkw9t21m5LK9BzCX7f6MV1Xc7qJNq0VGr3EVHReq7RcBxPfEzwv0dcs7xHkMgAOFaBw-Q-Og4x3Hi5kSA=@proton.me>
	<CAMYG4GkjFNCYEMgQ=Wtr0dEukyU2TOEfOp3nLbVRS5S+d-hrxA@mail.gmail.com>
	<4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me>
	<gmllY1ELqGRwcQANpnebX5gEwAx_QA0GjWGCD94kVG3xEfUSM0EBeoVGYFaRs5V7U810UsQlK2ZQs65OkJm7tczGztSZQit9sDKl1EnAUCs=@proton.me>
	<CAMYG4GkK+mX9cM+GifGaL=sTUuruBX1226yFLiEwP5QZOTJLMw@mail.gmail.com>
	<sXKYxIwT-inGkXCbmhk2k6s6R_jkAUSgaHZOFKkUpRGW96BxDM5knvormu5bE3L4SoJPtm711B7BEFLNkBfjF5eX0FQWoMWbL_4M0KRHbTs=@proton.me>
	<CAMYG4G=LA-cGM-mjRSn6T8juZZtKVhwaZbh=aPbh5nHXc=iFyg@mail.gmail.com>
Message-ID: <nprVOlvtObIe1kj2Ko-fxijKDRelAgksoEYiKMPh0g2wfkjQeL2TURtvEUOP-79f70GkxsnrlHPPp2OZTp73qF5jW-C53wFYUJtUV2hxUsQ=@proton.me>

Thank you for your responses many times. Looks like I'm missing something, sorry for my confusion, but let's take processor 0 on your first output. cEndInterior: 16 and cEnd: 24.
I'm calculating jacobian for cell=14, dof=0 (row = 42) and cell=18, dof=2 (col = 56) have influence on it. (Cell 18 is on processor boundary)
Shouldn't I have to write values on the (42,56)?

Thanks,
Guer

Sent with [Proton Mail](https://proton.me/) secure email.

------- Original Message -------
On Monday, October 16th, 2023 at 4:11 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Oct 16, 2023 at 6:54?AM erdemguer <erdemguer at proton.me> wrote:
>
>> Hey again.
>>
>> This code outputs for example:
>>
>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24
>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27
>> [0] m: 39 n: 39[1] m: 42 n: 42
>>
>> Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on processor boundaries?
>
> Here is my output
>
> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 -malloc_debug 0 -dm_refine 1
> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
> Before Distribution Rank: 0, cStart: 0, cEndInterior: 32, cEnd: 32
> After Distribution Rank: 1, cStart: 0, cEndInterior: 16, cEnd: 24
> After Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 24
> [0] m: 48 n: 48
> [1] m: 48 n: 48
>
> The mesh is 4x4 and also split into two triangles, so 32 triangles. Then we split it and have 8 overlap cells on each side. You can get quads using
>
> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 -malloc_debug 0 -dm_plex_simplex 0 -dm_refine 1 -dm_view
> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
> Before Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 16
> After Distribution Rank: 1, cStart: 0, cEndInterior: 8, cEnd: 12
> After Distribution Rank: 0, cStart: 0, cEndInterior: 8, cEnd: 12
> [0] m: 24 n: 24
> [1] m: 24 n: 24
>
> It is the same 4x4 mesh, but now with quads.
>
> Thanks,
>
> Matt
>
>> P.S. It looks like I should use PetscFV or something like that at the first place. At first I thought, "I will just use SNES, I will compute only residual and jacobian on cells so why do bother with PetscFV?" So
>>
>> Thanks,
>> E.
>>
>> Sent with [Proton Mail](https://proton.me/) secure email.
>>
>> ------- Original Message -------
>> On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>>> On Fri, Oct 13, 2023 at 7:26?AM erdemguer <erdemguer at proton.me> wrote:
>>>
>>>> Hi, unfortunately it's me again.
>>>>
>>>> I have some weird troubles with creating matrix with DMPlex. Actually I might not need to create matrix explicitly, but SNESSolve crashes at there too. So, I updated the code you provided. When I tried to use DMCreateMatrix() at first, I got an error "Unknown discretization type for field 0" at first I applied DMSetLocalSection() and this error is gone. But this time when I run the code with multiple processors, sometimes I got an output like:
>>>
>>> Some setup was out of order so the section size on proc1 was 0, and I was not good about checking this.
>>> I have fixed it and attached.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27
>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
>>>> [1] ghost cell 14
>>>> [1] ghost cell 15
>>>> [1] ghost cell 16
>>>> [1] ghost cell 17
>>>> [1] ghost cell 18
>>>> [1] ghost cell 19
>>>> [1] ghost cell 20
>>>> [1] ghost cell 21
>>>> [1] ghost cell 22
>>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23
>>>> [0] ghost cell 13
>>>> [0] ghost cell 14
>>>> [0] ghost cell 15
>>>> [0] ghost cell 16
>>>> [0] ghost cell 17
>>>> [0] ghost cell 18
>>>> [0] ghost cell 19
>>>> [0] ghost cell 20
>>>> [0] ghost cell 21
>>>> [0] ghost cell 22
>>>> [0] ghost cell 23
>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24
>>>> Fatal error in internal_Waitall: Unknown error class, error stack:
>>>> internal_Waitall(82)......................: MPI_Waitall(count=1, array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed
>>>> MPIR_Waitall(1099)........................:
>>>> MPIR_Waitall_impl(1011)...................:
>>>> MPIR_Waitall_state(976)...................:
>>>> MPIDI_CH3i_Progress_wait(187).............: an error occurred while handling an event returned by MPIDI_CH3I_Sock_Wait()
>>>> MPIDI_CH3I_Progress_handle_sock_event(411):
>>>> ReadMoreData(744).........................: ch3|sock|immedread 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains invalid memory (set=0,sock=1,errno=14:Bad address)
>>>>
>>>> Sometimes the error message isn't appearing but for example I'm trying to print size of the matrix but it isn't working.
>>>> If necessary, my Configure options --download-mpich --download-hwloc --download-pastix --download-hypre --download-ml --download-ctetgen --download-triangle --download-exodusii --download-netcdf --download-zlib --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" --with-debugging=1
>>>>
>>>> Version: Petsc Release Version 3.20.0
>>>>
>>>> Thank you,
>>>> Guer
>>>>
>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>
>>>> ------- Original Message -------
>>>> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer <erdemguer at proton.me> wrote:
>>>>
>>>>> Thank you! That's exactly what I need.
>>>>>
>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>
>>>>> ------- Original Message -------
>>>>> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>
>>>>>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer <erdemguer at proton.me> wrote:
>>>>>>
>>>>>>> Hi again,
>>>>>>
>>>>>> I see the problem. FV ghosts mean extra boundary cells added in FV methods using DMPlexCreateGhostCells() in order to impose boundary conditions. They are not the "ghost" cells for overlapping parallel decompositions. I have changed your code to give you what you want. It is attached.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>>> Here is my code:
>>>>>>> #include <petsc.h>
>>>>>>> static char help[] = "dmplex";
>>>>>>>
>>>>>>> int main(int argc, char **argv)
>>>>>>> {
>>>>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help));
>>>>>>> DM dm, dm_dist;
>>>>>>> PetscSection section;
>>>>>>> PetscInt cStart, cEndInterior, cEnd, rank;
>>>>>>> PetscInt nc[3] = {3, 3, 3};
>>>>>>> PetscReal upper[3] = {1, 1, 1};
>>>>>>>
>>>>>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));
>>>>>>>
>>>>>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm);
>>>>>>> DMViewFromOptions(dm, NULL, "-dm1_view");
>>>>>>> PetscCall(DMSetFromOptions(dm));
>>>>>>> DMViewFromOptions(dm, NULL, "-dm2_view");
>>>>>>>
>>>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>>>>>> DMPlexComputeCellTypes(dm);
>>>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL));
>>>>>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>>>>>> cEndInterior, cEnd);
>>>>>>>
>>>>>>> PetscInt nField = 1, nDof = 3, field = 0;
>>>>>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, &section));
>>>>>>> PetscSectionSetNumFields(section, nField);
>>>>>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd));
>>>>>>> for (PetscInt p = cStart; p < cEnd; p++)
>>>>>>> {
>>>>>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof));
>>>>>>> PetscCall(PetscSectionSetDof(section, p, nDof));
>>>>>>> }
>>>>>>>
>>>>>>> PetscCall(PetscSectionSetUp(section));
>>>>>>>
>>>>>>> DMSetLocalSection(dm, section);
>>>>>>> DMViewFromOptions(dm, NULL, "-dm3_view");
>>>>>>>
>>>>>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE);
>>>>>>> DMViewFromOptions(dm, NULL, "-dm4_view");
>>>>>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist));
>>>>>>> if (dm_dist)
>>>>>>> {
>>>>>>> DMDestroy(&dm);
>>>>>>> dm = dm_dist;
>>>>>>> }
>>>>>>> DMViewFromOptions(dm, NULL, "-dm5_view");
>>>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>>>>>> DMPlexComputeCellTypes(dm);
>>>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL));
>>>>>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>>>>>> cEndInterior, cEnd);
>>>>>>>
>>>>>>> DMDestroy(&dm);
>>>>>>> PetscCall(PetscFinalize());}
>>>>>>>
>>>>>>> This codes output is currently (on 2 processors) is:
>>>>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14
>>>>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13
>>>>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24
>>>>>>>
>>>>>>> DMView outputs:
>>>>>>> dm1_view (after creation):
>>>>>>> DM Object: 2 MPI processes
>>>>>>> type: plex
>>>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>>>> Number of 0-cells per rank: 64 0
>>>>>>> Number of 1-cells per rank: 144 0
>>>>>>> Number of 2-cells per rank: 108 0
>>>>>>> Number of 3-cells per rank: 27 0
>>>>>>> Labels:
>>>>>>> marker: 1 strata with value/size (1 (218))
>>>>>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9))
>>>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144))
>>>>>>>
>>>>>>> dm2_view (after setfromoptions):
>>>>>>> DM Object: 2 MPI processes
>>>>>>> type: plex
>>>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>>>> Number of 0-cells per rank: 40 46
>>>>>>> Number of 1-cells per rank: 83 95
>>>>>>> Number of 2-cells per rank: 57 64
>>>>>>> Number of 3-cells per rank: 13 14
>>>>>>> Labels:
>>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>>>> marker: 1 strata with value/size (1 (109))
>>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>>>>
>>>>>>> dm3_view (after setting local section):
>>>>>>> DM Object: 2 MPI processes
>>>>>>> type: plex
>>>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>>>> Number of 0-cells per rank: 40 46
>>>>>>> Number of 1-cells per rank: 83 95
>>>>>>> Number of 2-cells per rank: 57 64
>>>>>>> Number of 3-cells per rank: 13 14
>>>>>>> Labels:
>>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>>>> marker: 1 strata with value/size (1 (109))
>>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>>>> Field Field_0: adjacency FEM
>>>>>>>
>>>>>>> dm4_view (after setting adjacency):
>>>>>>> DM Object: 2 MPI processes
>>>>>>> type: plex
>>>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>>>> Number of 0-cells per rank: 40 46
>>>>>>> Number of 1-cells per rank: 83 95
>>>>>>> Number of 2-cells per rank: 57 64
>>>>>>> Number of 3-cells per rank: 13 14
>>>>>>> Labels:
>>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>>>> marker: 1 strata with value/size (1 (109))
>>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>>>> Field Field_0: adjacency FVM++
>>>>>>>
>>>>>>> dm5_view (after distribution):
>>>>>>> DM Object: Parallel Mesh 2 MPI processes
>>>>>>> type: plex
>>>>>>> Parallel Mesh in 3 dimensions:
>>>>>>> Number of 0-cells per rank: 64 60
>>>>>>> Number of 1-cells per rank: 144 133
>>>>>>> Number of 2-cells per rank: 108 98
>>>>>>> Number of 3-cells per rank: 27 24
>>>>>>> Labels:
>>>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>>>>>>> marker: 1 strata with value/size (1 (218))
>>>>>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9))
>>>>>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27))
>>>>>>> Field Field_0: adjacency FVM++
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Guer.
>>>>>>>
>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>>>
>>>>>>> ------- Original Message -------
>>>>>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>>
>>>>>>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine:
>>>>>>>>>
>>>>>>>>> - DMPlexCreateBoxMesh
>>>>>>>>>
>>>>>>>>> - DMSetFromOptions
>>>>>>>>> - PetscSectionCreate
>>>>>>>>> - PetscSectionSetNumFields
>>>>>>>>> - PetscSectionSetFieldDof
>>>>>>>>>
>>>>>>>>> - PetscSectionSetDof
>>>>>>>>>
>>>>>>>>> - PetscSectionSetUp
>>>>>>>>> - DMSetLocalSection
>>>>>>>>> - DMSetAdjacency
>>>>>>>>> - DMPlexDistribute
>>>>>>>>>
>>>>>>>>> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells.
>>>>>>>>
>>>>>>>> Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put
>>>>>>>>
>>>>>>>> DMViewFromOptions(dm, NULL, "-dm1_view")
>>>>>>>>
>>>>>>>> with a different string after each call.
>>>>>>>>
>>>>>>>>> But I couldn't figure out how to decide where the ghost/processor boundary cells start.
>>>>>>>>
>>>>>>>> Please send the actual code because the above is not specific enough. For example, you will not have
>>>>>>>> "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity,
>>>>>>>> so each process gets a unique set.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing.
>>>>>>>>>
>>>>>>>>> Thanks again,
>>>>>>>>> Guer.
>>>>>>>>>
>>>>>>>>> ------- Original Message -------
>>>>>>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow:
>>>>>>>>>>>
>>>>>>>>>>> Create a DMPlex mesh from GMSH.
>>>>>>>>>>> Reorder it with DMPlexPermute.
>>>>>>>>>>> Create necessary pre-processing arrays related to the mesh/problem.
>>>>>>>>>>> Create field(s) with multi-dofs.
>>>>>>>>>>> Create residual vectors.
>>>>>>>>>>> Define a function to calculate the residual for each cell and, use SNES.
>>>>>>>>>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic?
>>>>>>>>>>
>>>>>>>>>> The intention was that there is enough information in the manual to do this.
>>>>>>>>>>
>>>>>>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand.
>>>>>>>>>>
>>>>>>>>>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Matt
>>>>>>>>>>
>>>>>>>>>>> Thank you.
>>>>>>>>>>> Guer.
>>>>>>>>>>>
>>>>>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>
>>>>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>>>>>
>>>>>> --
>>>>>>
>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>>
>>> --
>>>
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>>
>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>
> --
>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/0dd694ae/attachment-0001.html>

From knepley at gmail.com  Mon Oct 16 08:26:14 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 16 Oct 2023 09:26:14 -0400
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <nprVOlvtObIe1kj2Ko-fxijKDRelAgksoEYiKMPh0g2wfkjQeL2TURtvEUOP-79f70GkxsnrlHPPp2OZTp73qF5jW-C53wFYUJtUV2hxUsQ=@proton.me>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4GmQjoWw9+dDNA=8=ROhQec-sgBd9pPfA0rVayEudTN97A@mail.gmail.com>
	<Guidds2-aFUR6yWpJSAgb5CkmFqkw9t21m5LK9BzCX7f6MV1Xc7qJNq0VGr3EVHReq7RcBxPfEzwv0dcs7xHkMgAOFaBw-Q-Og4x3Hi5kSA=@proton.me>
	<CAMYG4GkjFNCYEMgQ=Wtr0dEukyU2TOEfOp3nLbVRS5S+d-hrxA@mail.gmail.com>
	<4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me>
	<gmllY1ELqGRwcQANpnebX5gEwAx_QA0GjWGCD94kVG3xEfUSM0EBeoVGYFaRs5V7U810UsQlK2ZQs65OkJm7tczGztSZQit9sDKl1EnAUCs=@proton.me>
	<CAMYG4GkK+mX9cM+GifGaL=sTUuruBX1226yFLiEwP5QZOTJLMw@mail.gmail.com>
	<sXKYxIwT-inGkXCbmhk2k6s6R_jkAUSgaHZOFKkUpRGW96BxDM5knvormu5bE3L4SoJPtm711B7BEFLNkBfjF5eX0FQWoMWbL_4M0KRHbTs=@proton.me>
	<CAMYG4G=LA-cGM-mjRSn6T8juZZtKVhwaZbh=aPbh5nHXc=iFyg@mail.gmail.com>
	<nprVOlvtObIe1kj2Ko-fxijKDRelAgksoEYiKMPh0g2wfkjQeL2TURtvEUOP-79f70GkxsnrlHPPp2OZTp73qF5jW-C53wFYUJtUV2hxUsQ=@proton.me>
Message-ID: <CAMYG4GmdNPds1OtdoG4y=eNPHSUHGmiQP8O9ymsb1Y-i8K0r=Q@mail.gmail.com>

On Mon, Oct 16, 2023 at 9:22?AM erdemguer <erdemguer at proton.me> wrote:

> Thank you for your responses many times. Looks like I'm missing something,
> sorry for my confusion, but let's take processor 0 on your first output.
> cEndInterior: 16 and cEnd: 24.
> I'm calculating jacobian for cell=14, dof=0 (row = 42) and cell=18, dof=2
> (col = 56) have influence on it. (Cell 18 is on processor boundary)
> Shouldn't I have to write values on the (42,56)?
>

Imagine you are me getting this mail. When I mail you, I show you _exactly_
what I ran and which command line  options I used. You do not. I provide
you all the output. You do not. You can see that someone would only be
guessing when replying to this email. Also note that you have two dofs per
cell, so the cell numbers are not the row numbers for the Jacobian. Please
send something reproducible when you want help on running.

  Thanks,

     Matt


> Thanks,
> Guer
>
> Sent with Proton Mail <https://proton.me/> secure email.
>
> ------- Original Message -------
> On Monday, October 16th, 2023 at 4:11 PM, Matthew Knepley <
> knepley at gmail.com> wrote:
>
> On Mon, Oct 16, 2023 at 6:54?AM erdemguer <erdemguer at proton.me> wrote:
>
>> Hey again.
>>
>> This code outputs for example:
>>
>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24
>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27
>> [0] m: 39 n: 39
>> [1] m: 42 n: 42
>>
>> Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on
>> processor boundaries?
>>
>
> Here is my output
>
> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1
> -malloc_debug 0 -dm_refine 1
> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
> Before Distribution Rank: 0, cStart: 0, cEndInterior: 32, cEnd: 32
> After Distribution Rank: 1, cStart: 0, cEndInterior: 16, cEnd: 24
> After Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 24
> [0] m: 48 n: 48
> [1] m: 48 n: 48
>
> The mesh is 4x4 and also split into two triangles, so 32 triangles. Then
> we split it and have 8 overlap cells on each side. You can get quads using
>
> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1
> -malloc_debug 0 -dm_plex_simplex 0 -dm_refine 1 -dm_view
> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
> Before Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 16
> After Distribution Rank: 1, cStart: 0, cEndInterior: 8, cEnd: 12
> After Distribution Rank: 0, cStart: 0, cEndInterior: 8, cEnd: 12
> [0] m: 24 n: 24
> [1] m: 24 n: 24
> It is the same 4x4 mesh, but now with quads.
>
> Thanks,
>
> Matt
>
> P.S. It looks like I should use PetscFV or something like that at the
>> first place. At first I thought, "I will just use SNES, I will compute only
>> residual and jacobian on cells so why do bother with PetscFV?" So
>>
>> Thanks,
>> E.
>> Sent with Proton Mail <https://proton.me/> secure email.
>>
>> ------- Original Message -------
>> On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley <
>> knepley at gmail.com> wrote:
>>
>> On Fri, Oct 13, 2023 at 7:26?AM erdemguer <erdemguer at proton.me> wrote:
>>
>>> Hi, unfortunately it's me again.
>>>
>>> I have some weird troubles with creating matrix with DMPlex. Actually I
>>> might not need to create matrix explicitly, but SNESSolve crashes at there
>>> too. So, I updated the code you provided. When I tried to use
>>> DMCreateMatrix() at first, I got an error "Unknown discretization type
>>> for field 0" at first I applied DMSetLocalSection() and this error is gone.
>>> But this time when I run the code with multiple processors, sometimes I got
>>> an output like:
>>>
>>
>> Some setup was out of order so the section size on proc1 was 0, and I was
>> not good about checking this.
>> I have fixed it and attached.
>>
>> Thanks,
>>
>> Matt
>>
>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27
>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
>>> [1] ghost cell 14
>>> [1] ghost cell 15
>>> [1] ghost cell 16
>>> [1] ghost cell 17
>>> [1] ghost cell 18
>>> [1] ghost cell 19
>>> [1] ghost cell 20
>>> [1] ghost cell 21
>>> [1] ghost cell 22
>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23
>>> [0] ghost cell 13
>>> [0] ghost cell 14
>>> [0] ghost cell 15
>>> [0] ghost cell 16
>>> [0] ghost cell 17
>>> [0] ghost cell 18
>>> [0] ghost cell 19
>>> [0] ghost cell 20
>>> [0] ghost cell 21
>>> [0] ghost cell 22
>>> [0] ghost cell 23
>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24
>>> Fatal error in internal_Waitall: Unknown error class, error stack:
>>> internal_Waitall(82)......................: MPI_Waitall(count=1,
>>> array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed
>>> MPIR_Waitall(1099)........................:
>>> MPIR_Waitall_impl(1011)...................:
>>> MPIR_Waitall_state(976)...................:
>>> MPIDI_CH3i_Progress_wait(187).............: an error occurred while
>>> handling an event returned by MPIDI_CH3I_Sock_Wait()
>>> MPIDI_CH3I_Progress_handle_sock_event(411):
>>> ReadMoreData(744).........................: ch3|sock|immedread
>>> 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880
>>> MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains
>>> invalid memory (set=0,sock=1,errno=14:Bad address)
>>>
>>> Sometimes the error message isn't appearing but for example I'm trying
>>> to print size of the matrix but it isn't working.
>>> If necessary, my Configure options --download-mpich --download-hwloc
>>> --download-pastix --download-hypre --download-ml --download-ctetgen
>>> --download-triangle --download-exodusii --download-netcdf --download-zlib
>>> --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16
>>> --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g
>>> -O2" --with-debugging=1
>>>
>>> Version: Petsc Release Version 3.20.0
>>>
>>> Thank you,
>>> Guer
>>>
>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>
>>> ------- Original Message -------
>>> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer <
>>> erdemguer at proton.me> wrote:
>>>
>>> Thank you! That's exactly what I need.
>>>
>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>
>>> ------- Original Message -------
>>> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley <
>>> knepley at gmail.com> wrote:
>>>
>>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer <erdemguer at proton.me> wrote:
>>>
>>>> Hi again,
>>>>
>>>
>>> I see the problem. FV ghosts mean extra boundary cells added in FV
>>> methods using DMPlexCreateGhostCells() in order to impose boundary
>>> conditions. They are not the "ghost" cells for overlapping parallel
>>> decompositions. I have changed your code to give you what you want. It is
>>> attached.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>> Here is my code:
>>>> #include <petsc.h>
>>>> static char help[] = "dmplex";
>>>>
>>>> int main(int argc, char **argv)
>>>> {
>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help));
>>>> DM dm, dm_dist;
>>>> PetscSection section;
>>>> PetscInt cStart, cEndInterior, cEnd, rank;
>>>> PetscInt nc[3] = {3, 3, 3};
>>>> PetscReal upper[3] = {1, 1, 1};
>>>>
>>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));
>>>>
>>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper,
>>>> NULL, PETSC_TRUE, &dm);
>>>> DMViewFromOptions(dm, NULL, "-dm1_view");
>>>> PetscCall(DMSetFromOptions(dm));
>>>> DMViewFromOptions(dm, NULL, "-dm2_view");
>>>>
>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>>> DMPlexComputeCellTypes(dm);
>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST,
>>>> &cEndInterior, NULL));
>>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d,
>>>> cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>>> cEndInterior, cEnd);
>>>>
>>>> PetscInt nField = 1, nDof = 3, field = 0;
>>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, &section));
>>>> PetscSectionSetNumFields(section, nField);
>>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd));
>>>> for (PetscInt p = cStart; p < cEnd; p++)
>>>> {
>>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof));
>>>> PetscCall(PetscSectionSetDof(section, p, nDof));
>>>> }
>>>>
>>>> PetscCall(PetscSectionSetUp(section));
>>>>
>>>> DMSetLocalSection(dm, section);
>>>> DMViewFromOptions(dm, NULL, "-dm3_view");
>>>>
>>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE);
>>>> DMViewFromOptions(dm, NULL, "-dm4_view");
>>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist));
>>>> if (dm_dist)
>>>> {
>>>> DMDestroy(&dm);
>>>> dm = dm_dist;
>>>> }
>>>> DMViewFromOptions(dm, NULL, "-dm5_view");
>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>>> DMPlexComputeCellTypes(dm);
>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST,
>>>> &cEndInterior, NULL));
>>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d,
>>>> cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>>> cEndInterior, cEnd);
>>>>
>>>> DMDestroy(&dm);
>>>> PetscCall(PetscFinalize());
>>>> }
>>>>
>>>> This codes output is currently (on 2 processors) is:
>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14
>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13
>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27
>>>> After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24
>>>>
>>>> DMView outputs:
>>>> dm1_view (after creation):
>>>> DM Object: 2 MPI processes
>>>> type: plex
>>>> DM_0x84000004_0 in 3 dimensions:
>>>> Number of 0-cells per rank: 64 0
>>>> Number of 1-cells per rank: 144 0
>>>> Number of 2-cells per rank: 108 0
>>>> Number of 3-cells per rank: 27 0
>>>> Labels:
>>>> marker: 1 strata with value/size (1 (218))
>>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9),
>>>> 2 (9))
>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>>>> celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144))
>>>>
>>>> dm2_view (after setfromoptions):
>>>> DM Object: 2 MPI processes
>>>> type: plex
>>>> DM_0x84000004_0 in 3 dimensions:
>>>> Number of 0-cells per rank: 40 46
>>>> Number of 1-cells per rank: 83 95
>>>> Number of 2-cells per rank: 57 64
>>>> Number of 3-cells per rank: 13 14
>>>> Labels:
>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>> marker: 1 strata with value/size (1 (109))
>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>
>>>> dm3_view (after setting local section):
>>>> DM Object: 2 MPI processes
>>>> type: plex
>>>> DM_0x84000004_0 in 3 dimensions:
>>>> Number of 0-cells per rank: 40 46
>>>> Number of 1-cells per rank: 83 95
>>>> Number of 2-cells per rank: 57 64
>>>> Number of 3-cells per rank: 13 14
>>>> Labels:
>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>> marker: 1 strata with value/size (1 (109))
>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>> Field Field_0:
>>>> adjacency FEM
>>>>
>>>> dm4_view (after setting adjacency):
>>>> DM Object: 2 MPI processes
>>>> type: plex
>>>> DM_0x84000004_0 in 3 dimensions:
>>>> Number of 0-cells per rank: 40 46
>>>> Number of 1-cells per rank: 83 95
>>>> Number of 2-cells per rank: 57 64
>>>> Number of 3-cells per rank: 13 14
>>>> Labels:
>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>> marker: 1 strata with value/size (1 (109))
>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>> Field Field_0:
>>>> adjacency FVM++
>>>>
>>>> dm5_view (after distribution):
>>>> DM Object: Parallel Mesh 2 MPI processes
>>>> type: plex
>>>> Parallel Mesh in 3 dimensions:
>>>> Number of 0-cells per rank: 64 60
>>>> Number of 1-cells per rank: 144 133
>>>> Number of 2-cells per rank: 108 98
>>>> Number of 3-cells per rank: 27 24
>>>> Labels:
>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>>>> marker: 1 strata with value/size (1 (218))
>>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9),
>>>> 6 (9))
>>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27))
>>>> Field Field_0:
>>>> adjacency FVM++
>>>>
>>>> Thanks,
>>>> Guer.
>>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>>
>>>> ------- Original Message -------
>>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley <
>>>> knepley at gmail.com> wrote:
>>>>
>>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>> Sorry for my late response. I tried with your suggestions and I think
>>>>> I made a progress. But I still got issues. Let me explain my latest mesh
>>>>> routine:
>>>>>
>>>>>
>>>>>    1. DMPlexCreateBoxMesh
>>>>>    2. DMSetFromOptions
>>>>>    3. PetscSectionCreate
>>>>>    4. PetscSectionSetNumFields
>>>>>    5. PetscSectionSetFieldDof
>>>>>    6. PetscSectionSetDof
>>>>>    7. PetscSectionSetUp
>>>>>    8. DMSetLocalSection
>>>>>    9. DMSetAdjacency
>>>>>    10. DMPlexDistribute
>>>>>
>>>>>
>>>>> It's still not working but it's promising, if I call
>>>>> DMPlexGetDepthStratum for cells, I can see that after distribution
>>>>> processors have more cells.
>>>>>
>>>>
>>>> Please send the output of DMPlexView() for each incarnation of the
>>>> mesh. What I do is put
>>>>
>>>> DMViewFromOptions(dm, NULL, "-dm1_view")
>>>>
>>>>
>>>> with a different string after each call.
>>>>
>>>>> But I couldn't figure out how to decide where the ghost/processor
>>>>> boundary cells start.
>>>>>
>>>>
>>>> Please send the actual code because the above is not specific enough.
>>>> For example, you will not have
>>>> "ghost cells" unless you partition with overlap. This is because by
>>>> default cells are the partitioned quantity,
>>>> so each process gets a unique set.
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I
>>>>> think that function is deprecated. I tried to use,
>>>>> DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm getting
>>>>> -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST,
>>>>> DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling
>>>>> DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed.
>>>>> I think I can calculate the ghost cell indices using cStart/cEnd before &
>>>>> after distribution but I think there is a better way I'm currently missing.
>>>>>
>>>>> Thanks again,
>>>>> Guer.
>>>>>
>>>>> ------- Original Message -------
>>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <
>>>>> knepley at gmail.com> wrote:
>>>>>
>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <
>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am currently using DMPlex in my code. It runs serially at the
>>>>>> moment, but I'm interested in adding parallel options. Here is my workflow:
>>>>>>
>>>>>> Create a DMPlex mesh from GMSH.
>>>>>> Reorder it with DMPlexPermute.
>>>>>> Create necessary pre-processing arrays related to the mesh/problem.
>>>>>> Create field(s) with multi-dofs.
>>>>>> Create residual vectors.
>>>>>> Define a function to calculate the residual for each cell and, use
>>>>>> SNES.
>>>>>> As you can see, I'm not using FV or FE structures (most examples do).
>>>>>> Now, I'm trying to implement this in parallel using a similar approach.
>>>>>> However, I'm struggling to understand how to create corresponding vectors
>>>>>> and how to obtain index sets for each processor. Is there a tutorial or
>>>>>> paper that covers this topic?
>>>>>>
>>>>>
>>>>> The intention was that there is enough information in the manual to do
>>>>> this.
>>>>>
>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage
>>>>> you to use PetscSection. Without this, it would be incredibly hard to do
>>>>> what you want. Once the DM has a Section, it can do things like
>>>>> automatically create vectors and matrices for you. It can redistribute
>>>>> them, subset them, etc. The Section describes how dofs are assigned to
>>>>> pieces of the mesh (mesh points). This is in the manual, and there are a
>>>>> few examples that do it by hand.
>>>>>
>>>>> So I suggest changing your code to use PetscSection, and then letting
>>>>> us know if things still do not work.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Matt
>>>>>
>>>>>> Thank you.
>>>>>> Guer.
>>>>>>
>>>>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/5781ce23/attachment-0001.html>

From erdemguer at proton.me  Mon Oct 16 09:10:11 2023
From: erdemguer at proton.me (erdemguer)
Date: Mon, 16 Oct 2023 14:10:11 +0000
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <CAMYG4GmdNPds1OtdoG4y=eNPHSUHGmiQP8O9ymsb1Y-i8K0r=Q@mail.gmail.com>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4GkjFNCYEMgQ=Wtr0dEukyU2TOEfOp3nLbVRS5S+d-hrxA@mail.gmail.com>
	<4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me>
	<gmllY1ELqGRwcQANpnebX5gEwAx_QA0GjWGCD94kVG3xEfUSM0EBeoVGYFaRs5V7U810UsQlK2ZQs65OkJm7tczGztSZQit9sDKl1EnAUCs=@proton.me>
	<CAMYG4GkK+mX9cM+GifGaL=sTUuruBX1226yFLiEwP5QZOTJLMw@mail.gmail.com>
	<sXKYxIwT-inGkXCbmhk2k6s6R_jkAUSgaHZOFKkUpRGW96BxDM5knvormu5bE3L4SoJPtm711B7BEFLNkBfjF5eX0FQWoMWbL_4M0KRHbTs=@proton.me>
	<CAMYG4G=LA-cGM-mjRSn6T8juZZtKVhwaZbh=aPbh5nHXc=iFyg@mail.gmail.com>
	<nprVOlvtObIe1kj2Ko-fxijKDRelAgksoEYiKMPh0g2wfkjQeL2TURtvEUOP-79f70GkxsnrlHPPp2OZTp73qF5jW-C53wFYUJtUV2hxUsQ=@proton.me>
	<CAMYG4GmdNPds1OtdoG4y=eNPHSUHGmiQP8O9ymsb1Y-i8K0r=Q@mail.gmail.com>
Message-ID: <90bHf8yDZXoFytUTk641jXgda3Mn6NMzpRaq3fL8oFuz05hAmy1THxKDvq7gKZZcy3ejvqnFZAaKLy5-WegRmL-TDPZVzjM_Y5-SR0FK3DY=@proton.me>

I'm truly sorry for my bad.
I set the nDof = 1 for simplicity. You can find my code in the attachments. In that code I tried to find an example of a cell which is neighbor to a cell in the another processor and print them.
Here is my output:
(base) ? build git:(main) ? /petsc/lib/petsc/bin/petscmpiexec -n 2 ./ex1_eg -dm_plex_dim 3 -dm_plex_simplex 0 -dm_plex_box_faces 3,3,3
Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27
Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27
After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24
[0] m: 13 n: 13
[1] m: 14 n: 14
[1] Face: 94, Center Cell: 7, Ghost Neighbor Cell: 23[0] Face: 145, Center Cell: 12, Ghost Neighbor Cell: 20

For example, if I'm writing residual for cell 12 on rank 0, I thought I need to write on (12,20) on the matrix too. But looks like that isn't the case.
Thanks,
Guer

Sent with [Proton Mail](https://proton.me/) secure email.

------- Original Message -------
On Monday, October 16th, 2023 at 4:26 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Oct 16, 2023 at 9:22?AM erdemguer <erdemguer at proton.me> wrote:
>
>> Thank you for your responses many times. Looks like I'm missing something, sorry for my confusion, but let's take processor 0 on your first output. cEndInterior: 16 and cEnd: 24.
>> I'm calculating jacobian for cell=14, dof=0 (row = 42) and cell=18, dof=2 (col = 56) have influence on it. (Cell 18 is on processor boundary)
>> Shouldn't I have to write values on the (42,56)?
>
> Imagine you are me getting this mail. When I mail you, I show you _exactly_ what I ran and which command line options I used. You do not. I provide you all the output. You do not. You can see that someone would only be guessing when replying to this email. Also note that you have two dofs per cell, so the cell numbers are not the row numbers for the Jacobian. Please send something reproducible when you want help on running.
>
> Thanks,
>
> Matt
>
>> Thanks,
>> Guer
>>
>> Sent with [Proton Mail](https://proton.me/) secure email.
>>
>> ------- Original Message -------
>> On Monday, October 16th, 2023 at 4:11 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>>> On Mon, Oct 16, 2023 at 6:54?AM erdemguer <erdemguer at proton.me> wrote:
>>>
>>>> Hey again.
>>>>
>>>> This code outputs for example:
>>>>
>>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24
>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27
>>>> [0] m: 39 n: 39[1] m: 42 n: 42
>>>>
>>>> Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on processor boundaries?
>>>
>>> Here is my output
>>>
>>> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 -malloc_debug 0 -dm_refine 1
>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 32, cEnd: 32
>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 16, cEnd: 24
>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 24
>>> [0] m: 48 n: 48
>>> [1] m: 48 n: 48
>>>
>>> The mesh is 4x4 and also split into two triangles, so 32 triangles. Then we split it and have 8 overlap cells on each side. You can get quads using
>>>
>>> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1 -malloc_debug 0 -dm_plex_simplex 0 -dm_refine 1 -dm_view
>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 16
>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 8, cEnd: 12
>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 8, cEnd: 12
>>> [0] m: 24 n: 24
>>> [1] m: 24 n: 24
>>>
>>> It is the same 4x4 mesh, but now with quads.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>> P.S. It looks like I should use PetscFV or something like that at the first place. At first I thought, "I will just use SNES, I will compute only residual and jacobian on cells so why do bother with PetscFV?" So
>>>>
>>>> Thanks,
>>>> E.
>>>>
>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>
>>>> ------- Original Message -------
>>>> On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>
>>>>> On Fri, Oct 13, 2023 at 7:26?AM erdemguer <erdemguer at proton.me> wrote:
>>>>>
>>>>>> Hi, unfortunately it's me again.
>>>>>>
>>>>>> I have some weird troubles with creating matrix with DMPlex. Actually I might not need to create matrix explicitly, but SNESSolve crashes at there too. So, I updated the code you provided. When I tried to use DMCreateMatrix() at first, I got an error "Unknown discretization type for field 0" at first I applied DMSetLocalSection() and this error is gone. But this time when I run the code with multiple processors, sometimes I got an output like:
>>>>>
>>>>> Some setup was out of order so the section size on proc1 was 0, and I was not good about checking this.
>>>>> I have fixed it and attached.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Matt
>>>>>
>>>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27
>>>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
>>>>>> [1] ghost cell 14
>>>>>> [1] ghost cell 15
>>>>>> [1] ghost cell 16
>>>>>> [1] ghost cell 17
>>>>>> [1] ghost cell 18
>>>>>> [1] ghost cell 19
>>>>>> [1] ghost cell 20
>>>>>> [1] ghost cell 21
>>>>>> [1] ghost cell 22
>>>>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23
>>>>>> [0] ghost cell 13
>>>>>> [0] ghost cell 14
>>>>>> [0] ghost cell 15
>>>>>> [0] ghost cell 16
>>>>>> [0] ghost cell 17
>>>>>> [0] ghost cell 18
>>>>>> [0] ghost cell 19
>>>>>> [0] ghost cell 20
>>>>>> [0] ghost cell 21
>>>>>> [0] ghost cell 22
>>>>>> [0] ghost cell 23
>>>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24
>>>>>> Fatal error in internal_Waitall: Unknown error class, error stack:
>>>>>> internal_Waitall(82)......................: MPI_Waitall(count=1, array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed
>>>>>> MPIR_Waitall(1099)........................:
>>>>>> MPIR_Waitall_impl(1011)...................:
>>>>>> MPIR_Waitall_state(976)...................:
>>>>>> MPIDI_CH3i_Progress_wait(187).............: an error occurred while handling an event returned by MPIDI_CH3I_Sock_Wait()
>>>>>> MPIDI_CH3I_Progress_handle_sock_event(411):
>>>>>> ReadMoreData(744).........................: ch3|sock|immedread 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer contains invalid memory (set=0,sock=1,errno=14:Bad address)
>>>>>>
>>>>>> Sometimes the error message isn't appearing but for example I'm trying to print size of the matrix but it isn't working.
>>>>>> If necessary, my Configure options --download-mpich --download-hwloc --download-pastix --download-hypre --download-ml --download-ctetgen --download-triangle --download-exodusii --download-netcdf --download-zlib --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16 --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" --with-debugging=1
>>>>>>
>>>>>> Version: Petsc Release Version 3.20.0
>>>>>>
>>>>>> Thank you,
>>>>>> Guer
>>>>>>
>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>>
>>>>>> ------- Original Message -------
>>>>>> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer <erdemguer at proton.me> wrote:
>>>>>>
>>>>>>> Thank you! That's exactly what I need.
>>>>>>>
>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>>>
>>>>>>> ------- Original Message -------
>>>>>>> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>>
>>>>>>>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer <erdemguer at proton.me> wrote:
>>>>>>>>
>>>>>>>>> Hi again,
>>>>>>>>
>>>>>>>> I see the problem. FV ghosts mean extra boundary cells added in FV methods using DMPlexCreateGhostCells() in order to impose boundary conditions. They are not the "ghost" cells for overlapping parallel decompositions. I have changed your code to give you what you want. It is attached.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>>> Here is my code:
>>>>>>>>> #include <petsc.h>
>>>>>>>>> static char help[] = "dmplex";
>>>>>>>>>
>>>>>>>>> int main(int argc, char **argv)
>>>>>>>>> {
>>>>>>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help));
>>>>>>>>> DM dm, dm_dist;
>>>>>>>>> PetscSection section;
>>>>>>>>> PetscInt cStart, cEndInterior, cEnd, rank;
>>>>>>>>> PetscInt nc[3] = {3, 3, 3};
>>>>>>>>> PetscReal upper[3] = {1, 1, 1};
>>>>>>>>>
>>>>>>>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));
>>>>>>>>>
>>>>>>>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper, NULL, PETSC_TRUE, &dm);
>>>>>>>>> DMViewFromOptions(dm, NULL, "-dm1_view");
>>>>>>>>> PetscCall(DMSetFromOptions(dm));
>>>>>>>>> DMViewFromOptions(dm, NULL, "-dm2_view");
>>>>>>>>>
>>>>>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>>>>>>>> DMPlexComputeCellTypes(dm);
>>>>>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST, &cEndInterior, NULL));
>>>>>>>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>>>>>>>> cEndInterior, cEnd);
>>>>>>>>>
>>>>>>>>> PetscInt nField = 1, nDof = 3, field = 0;
>>>>>>>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, &section));
>>>>>>>>> PetscSectionSetNumFields(section, nField);
>>>>>>>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd));
>>>>>>>>> for (PetscInt p = cStart; p < cEnd; p++)
>>>>>>>>> {
>>>>>>>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof));
>>>>>>>>> PetscCall(PetscSectionSetDof(section, p, nDof));
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> PetscCall(PetscSectionSetUp(section));
>>>>>>>>>
>>>>>>>>> DMSetLocalSection(dm, section);
>>>>>>>>> DMViewFromOptions(dm, NULL, "-dm3_view");
>>>>>>>>>
>>>>>>>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE);
>>>>>>>>> DMViewFromOptions(dm, NULL, "-dm4_view");
>>>>>>>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist));
>>>>>>>>> if (dm_dist)
>>>>>>>>> {
>>>>>>>>> DMDestroy(&dm);
>>>>>>>>> dm = dm_dist;
>>>>>>>>> }
>>>>>>>>> DMViewFromOptions(dm, NULL, "-dm5_view");
>>>>>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>>>>>>>> DMPlexComputeCellTypes(dm);
>>>>>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST, &cEndInterior, NULL));
>>>>>>>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>>>>>>>> cEndInterior, cEnd);
>>>>>>>>>
>>>>>>>>> DMDestroy(&dm);
>>>>>>>>> PetscCall(PetscFinalize());}
>>>>>>>>>
>>>>>>>>> This codes output is currently (on 2 processors) is:
>>>>>>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14
>>>>>>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13
>>>>>>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24
>>>>>>>>>
>>>>>>>>> DMView outputs:
>>>>>>>>> dm1_view (after creation):
>>>>>>>>> DM Object: 2 MPI processes
>>>>>>>>> type: plex
>>>>>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>>>>>> Number of 0-cells per rank: 64 0
>>>>>>>>> Number of 1-cells per rank: 144 0
>>>>>>>>> Number of 2-cells per rank: 108 0
>>>>>>>>> Number of 3-cells per rank: 27 0
>>>>>>>>> Labels:
>>>>>>>>> marker: 1 strata with value/size (1 (218))
>>>>>>>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1 (9), 2 (9))
>>>>>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27)) celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144))
>>>>>>>>>
>>>>>>>>> dm2_view (after setfromoptions):
>>>>>>>>> DM Object: 2 MPI processes
>>>>>>>>> type: plex
>>>>>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>>>>>> Number of 0-cells per rank: 40 46
>>>>>>>>> Number of 1-cells per rank: 83 95
>>>>>>>>> Number of 2-cells per rank: 57 64
>>>>>>>>> Number of 3-cells per rank: 13 14
>>>>>>>>> Labels:
>>>>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>>>>>> marker: 1 strata with value/size (1 (109))
>>>>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4)) celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>>>>>>
>>>>>>>>> dm3_view (after setting local section):
>>>>>>>>> DM Object: 2 MPI processes
>>>>>>>>> type: plex
>>>>>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>>>>>> Number of 0-cells per rank: 40 46
>>>>>>>>> Number of 1-cells per rank: 83 95
>>>>>>>>> Number of 2-cells per rank: 57 64
>>>>>>>>> Number of 3-cells per rank: 13 14
>>>>>>>>> Labels:
>>>>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>>>>>> marker: 1 strata with value/size (1 (109))
>>>>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>>>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>>>>>> Field Field_0: adjacency FEM
>>>>>>>>>
>>>>>>>>> dm4_view (after setting adjacency):
>>>>>>>>> DM Object: 2 MPI processes
>>>>>>>>> type: plex
>>>>>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>>>>>> Number of 0-cells per rank: 40 46
>>>>>>>>> Number of 1-cells per rank: 83 95
>>>>>>>>> Number of 2-cells per rank: 57 64
>>>>>>>>> Number of 3-cells per rank: 13 14
>>>>>>>>> Labels:
>>>>>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>>>>>> marker: 1 strata with value/size (1 (109))
>>>>>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>>>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>>>>>> Field Field_0: adjacency FVM++
>>>>>>>>>
>>>>>>>>> dm5_view (after distribution):
>>>>>>>>> DM Object: Parallel Mesh 2 MPI processes
>>>>>>>>> type: plex
>>>>>>>>> Parallel Mesh in 3 dimensions:
>>>>>>>>> Number of 0-cells per rank: 64 60
>>>>>>>>> Number of 1-cells per rank: 144 133
>>>>>>>>> Number of 2-cells per rank: 108 98
>>>>>>>>> Number of 3-cells per rank: 27 24
>>>>>>>>> Labels:
>>>>>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>>>>>>>>> marker: 1 strata with value/size (1 (218))
>>>>>>>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5 (9), 6 (9))
>>>>>>>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27))
>>>>>>>>> Field Field_0: adjacency FVM++
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Guer.
>>>>>>>>>
>>>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>>>>>
>>>>>>>>> ------- Original Message -------
>>>>>>>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>> Sorry for my late response. I tried with your suggestions and I think I made a progress. But I still got issues. Let me explain my latest mesh routine:
>>>>>>>>>>>
>>>>>>>>>>> - DMPlexCreateBoxMesh
>>>>>>>>>>>
>>>>>>>>>>> - DMSetFromOptions
>>>>>>>>>>> - PetscSectionCreate
>>>>>>>>>>> - PetscSectionSetNumFields
>>>>>>>>>>> - PetscSectionSetFieldDof
>>>>>>>>>>>
>>>>>>>>>>> - PetscSectionSetDof
>>>>>>>>>>>
>>>>>>>>>>> - PetscSectionSetUp
>>>>>>>>>>> - DMSetLocalSection
>>>>>>>>>>> - DMSetAdjacency
>>>>>>>>>>> - DMPlexDistribute
>>>>>>>>>>>
>>>>>>>>>>> It's still not working but it's promising, if I call DMPlexGetDepthStratum for cells, I can see that after distribution processors have more cells.
>>>>>>>>>>
>>>>>>>>>> Please send the output of DMPlexView() for each incarnation of the mesh. What I do is put
>>>>>>>>>>
>>>>>>>>>> DMViewFromOptions(dm, NULL, "-dm1_view")
>>>>>>>>>>
>>>>>>>>>> with a different string after each call.
>>>>>>>>>>
>>>>>>>>>>> But I couldn't figure out how to decide where the ghost/processor boundary cells start.
>>>>>>>>>>
>>>>>>>>>> Please send the actual code because the above is not specific enough. For example, you will not have
>>>>>>>>>> "ghost cells" unless you partition with overlap. This is because by default cells are the partitioned quantity,
>>>>>>>>>> so each process gets a unique set.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Matt
>>>>>>>>>>
>>>>>>>>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I think that function is deprecated. I tried to use, DMPlexGetCellTypeStratumas in ts/tutorials/ex11_sa.c but I'm getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST, DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed. I think I can calculate the ghost cell indices using cStart/cEnd before & after distribution but I think there is a better way I'm currently missing.
>>>>>>>>>>>
>>>>>>>>>>> Thanks again,
>>>>>>>>>>> Guer.
>>>>>>>>>>>
>>>>>>>>>>> ------- Original Message -------
>>>>>>>>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am currently using DMPlex in my code. It runs serially at the moment, but I'm interested in adding parallel options. Here is my workflow:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Create a DMPlex mesh from GMSH.
>>>>>>>>>>>>> Reorder it with DMPlexPermute.
>>>>>>>>>>>>> Create necessary pre-processing arrays related to the mesh/problem.
>>>>>>>>>>>>> Create field(s) with multi-dofs.
>>>>>>>>>>>>> Create residual vectors.
>>>>>>>>>>>>> Define a function to calculate the residual for each cell and, use SNES.
>>>>>>>>>>>>> As you can see, I'm not using FV or FE structures (most examples do). Now, I'm trying to implement this in parallel using a similar approach. However, I'm struggling to understand how to create corresponding vectors and how to obtain index sets for each processor. Is there a tutorial or paper that covers this topic?
>>>>>>>>>>>>
>>>>>>>>>>>> The intention was that there is enough information in the manual to do this.
>>>>>>>>>>>>
>>>>>>>>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage you to use PetscSection. Without this, it would be incredibly hard to do what you want. Once the DM has a Section, it can do things like automatically create vectors and matrices for you. It can redistribute them, subset them, etc. The Section describes how dofs are assigned to pieces of the mesh (mesh points). This is in the manual, and there are a few examples that do it by hand.
>>>>>>>>>>>>
>>>>>>>>>>>> So I suggest changing your code to use PetscSection, and then letting us know if things still do not work.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Matt
>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>> Guer.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>
>>>>>>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>
>>>>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>>>>
>>>>> --
>>>>>
>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>>>
>>> --
>>>
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>>
>>> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
>
> --
>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/666478cc/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ex1_eg.c
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/666478cc/attachment-0001.c>

From facklerpw at ornl.gov  Mon Oct 16 09:33:01 2023
From: facklerpw at ornl.gov (Fackler, Philip)
Date: Mon, 16 Oct 2023 14:33:01 +0000
Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses
 switching to COO interface
In-Reply-To: <SA1PR09MB8077175B463CA8A2270437BEC6CCA@SA1PR09MB8077.namprd09.prod.outlook.com>
References: <SA1PR09MB80772E375B8C0C24CCE0BD85C6C5A@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp-NBG=CzGEZ-8PVCa3sUiG2erRCKUJG+rP-3cLcN+cVFg@mail.gmail.com>
	<CA+MQGp-tCQhjpVxjqF-zajKGsZNHZFH3v-S-RSdG=ciChu0Oow@mail.gmail.com>
	<CA+MQGp-Srn8rxJsaozPYt+hOge5VJdMuJdwVHB=sxHyCS_KoAA@mail.gmail.com>
	<SA1PR09MB807710D811420766DA7BBFADC6CAA@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp8J9hXjNF__TvfcF97sG9VwzSeis=BChEzvQkGGp3X2Rg@mail.gmail.com>
	<SA1PR09MB8077175B463CA8A2270437BEC6CCA@SA1PR09MB8077.namprd09.prod.outlook.com>
Message-ID: <SA1PR09MB80778FD5B589D0714D5A6CCBC6D7A@SA1PR09MB8077.namprd09.prod.outlook.com>

Junchao,

I've attached updated timing plots (red and blue are swapped from before; yellow is the new one). There is an improvement for the NE_3 case only with CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI cases, MatShift doesn't show up (I assume because we're using different preconditioner arguments). So, there must be some other primary culprit. I'll try to get updated profiling data to you soon.

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Fackler, Philip via Xolotl-psi-development <xolotl-psi-development at lists.sourceforge.net>
Sent: Wednesday, October 11, 2023 11:31
To: Junchao Zhang <junchao.zhang at gmail.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>
Subject: Re: [Xolotl-psi-development] [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface

I'm on it.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: Wednesday, October 11, 2023 10:14
To: Fackler, Philip <facklerpw at ornl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>; Blondel, Sophie <sblondel at utk.edu>
Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface

Hi,  Philip,
  Could you try this branch jczhang/2023-10-05/feature-support-matshift-aijkokkos ?

  Thanks.
--Junchao Zhang


On Thu, Oct 5, 2023 at 4:52?PM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
Aha! That makes sense. Thank you.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Thursday, October 5, 2023 17:29
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>
Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching to COO interface

Wait a moment, it seems it was because we do not have a GPU implementation of MatShift...
Let me see how to add it.
--Junchao Zhang


On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Hi, Philip,
  I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() instead of the COO interface?  MatSetValues() needs to copy the data from device to host and thus is expensive.
  Do you have profiling results with COO enabled?

[Screenshot 2023-10-05 at 10.55.29?AM.png]


--Junchao Zhang


On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Hi, Philip,
  I will look into the tarballs and get back to you.
   Thanks.
--Junchao Zhang


On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
We finally have xolotl ported to use the new COO interface and the aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port to our previous version (using MatSetValuesStencil and the default Mat and Vec implementations), we expected to see an improvement in performance for both the "serial" and "cuda" builds (here I'm referring to the kokkos configuration).

Attached are two plots that show timings for three different cases. All of these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on a single node). The CUDA cases were given one GPU per task (and used CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible.

The performance of RHSJacobian (where the bulk of computation happens in xolotl) behaved basically as expected (better than expected in the serial build). NE_3 case in CUDA was the only one that performed worse, but not surprisingly, since its workload for the GPUs is much smaller. We've still got more optimization to do on this.

The real surprise was how much worse the overall solve times were. This seems to be due simply to switching to the kokkos-based implementation. I'm wondering if there are any changes we can make in configuration or runtime arguments to help with PETSc's performance here. Any help looking into this would be appreciated.

The tarballs linked here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA&e=> and here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk&e=> are profiling databases which, once extracted, can be viewed with hpcviewer. I don't know how helpful that will be, but hopefully it can give you some direction.

Thanks for your help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/5e985fe8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Total Solve Times.png
Type: image/png
Size: 15648 bytes
Desc: Total Solve Times.png
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/5e985fe8/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RHSJacobian() calls.png
Type: image/png
Size: 15568 bytes
Desc: RHSJacobian() calls.png
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/5e985fe8/attachment-0003.png>

From knepley at gmail.com  Mon Oct 16 09:35:57 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 16 Oct 2023 10:35:57 -0400
Subject: [petsc-users] Parallel DMPlex
In-Reply-To: <90bHf8yDZXoFytUTk641jXgda3Mn6NMzpRaq3fL8oFuz05hAmy1THxKDvq7gKZZcy3ejvqnFZAaKLy5-WegRmL-TDPZVzjM_Y5-SR0FK3DY=@proton.me>
References: <SljqS0zlLweC_LqYWWxjmiNLPu7I8WClepj9HNwi-cm6ZDsbxDZpRQeZWLxKuELAhuN4FuSQLn6T6t2UZKo2BDrJi6OMwXV2yTEuwBNvB04=@proton.me>
	<CAMYG4GkjFNCYEMgQ=Wtr0dEukyU2TOEfOp3nLbVRS5S+d-hrxA@mail.gmail.com>
	<4R73GX8FErHKfozdfRTz5jF6HtHo_s1_A8BGitE9u-w00Cd-bHkqTR7mycihTu93NknXVjLYIUv9oQGLfR-S3TolpZiSrGmV6IRcfPiFIV0=@proton.me>
	<gmllY1ELqGRwcQANpnebX5gEwAx_QA0GjWGCD94kVG3xEfUSM0EBeoVGYFaRs5V7U810UsQlK2ZQs65OkJm7tczGztSZQit9sDKl1EnAUCs=@proton.me>
	<CAMYG4GkK+mX9cM+GifGaL=sTUuruBX1226yFLiEwP5QZOTJLMw@mail.gmail.com>
	<sXKYxIwT-inGkXCbmhk2k6s6R_jkAUSgaHZOFKkUpRGW96BxDM5knvormu5bE3L4SoJPtm711B7BEFLNkBfjF5eX0FQWoMWbL_4M0KRHbTs=@proton.me>
	<CAMYG4G=LA-cGM-mjRSn6T8juZZtKVhwaZbh=aPbh5nHXc=iFyg@mail.gmail.com>
	<nprVOlvtObIe1kj2Ko-fxijKDRelAgksoEYiKMPh0g2wfkjQeL2TURtvEUOP-79f70GkxsnrlHPPp2OZTp73qF5jW-C53wFYUJtUV2hxUsQ=@proton.me>
	<CAMYG4GmdNPds1OtdoG4y=eNPHSUHGmiQP8O9ymsb1Y-i8K0r=Q@mail.gmail.com>
	<90bHf8yDZXoFytUTk641jXgda3Mn6NMzpRaq3fL8oFuz05hAmy1THxKDvq7gKZZcy3ejvqnFZAaKLy5-WegRmL-TDPZVzjM_Y5-SR0FK3DY=@proton.me>
Message-ID: <CAMYG4GmK--JpcpVAtXhpyxwad58b+8p=5hhsVhHjr31CeZrS7A@mail.gmail.com>

On Mon, Oct 16, 2023 at 10:10?AM erdemguer <erdemguer at proton.me> wrote:

> I'm truly sorry for my bad.
> I set the nDof = 1 for simplicity. You can find my code in the
> attachments. In that code I tried to find an example of a cell which is
> neighbor to a cell in the another processor and print them.
> Here is my output:
> (base) ?  build git:(main) ? /petsc/lib/petsc/bin/petscmpiexec -n 2
> ./ex1_eg  -dm_plex_dim 3 -dm_plex_simplex 0 -dm_plex_box_faces 3,3,3
> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27
> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27
> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24
> [0] m: 13 n: 13
> [1] m: 14 n: 14
> [1] Face: 94, Center Cell: 7, Ghost Neighbor Cell: 23
> [0] Face: 145, Center Cell: 12, Ghost Neighbor Cell: 20
>

You can force us to have the same partition using  -petscpartitioner_type
simple,

master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1
-malloc_debug 0 -dm_plex_dim 3 -dm_plex_simplex 0 -dm_plex_box_faces 3,3,3
-petscpartitioner_type simple
Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27
Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
After Distribution Rank: 0, cStart: 0, cEndInterior: 14, cEnd: 27
After Distribution Rank: 1, cStart: 0, cEndInterior: 13, cEnd: 26
[0] m: 14 n: 14
[1] m: 13 n: 13
[1] Face: 89, Center Cell: 0, Ghost Neighbor Cell: 25
[0] Face: 140, Center Cell: 13, Ghost Neighbor Cell: 14


> For example, if I'm writing residual for cell 12 on rank 0, I thought I
> need to write on (12,20) on the matrix too. But looks like that isn't the
> case.
>

There are two problems here:

1) 20 is the _local_ number of that cell, but matrices use global numbers.
If you want to know that global number of that dof,
it is two steps. First, you need the cell number on the other process. You
can get this from the pointSF. If

   leaves[i] = 20, then remotes[i].index = <cell on remote process>

Then you need the dof for that remote cell. However, this work has already
been done by the global Section. So

  DMGetGlobalSection(dm, &gsec);
  PetscSectionGetOffset(gsec, 20, &off);
  off = -(off + 1);

since dofs we do not own will be encoded as -(dof + 1).

2) You need to decide how you want to assemble. Do we assemble the
contributions from the cells we own, or from the faces we own. Most FV
people divide up the faces.

  Thanks,

     Matt


> Thanks,
> Guer
>
> Sent with Proton Mail <https://proton.me/> secure email.
>
> ------- Original Message -------
> On Monday, October 16th, 2023 at 4:26 PM, Matthew Knepley <
> knepley at gmail.com> wrote:
>
> On Mon, Oct 16, 2023 at 9:22?AM erdemguer <erdemguer at proton.me> wrote:
>
>> Thank you for your responses many times. Looks like I'm missing
>> something, sorry for my confusion, but let's take processor 0 on your first
>> output. cEndInterior: 16 and cEnd: 24.
>> I'm calculating jacobian for cell=14, dof=0 (row = 42) and cell=18, dof=2
>> (col = 56) have influence on it. (Cell 18 is on processor boundary)
>> Shouldn't I have to write values on the (42,56)?
>>
>
> Imagine you are me getting this mail. When I mail you, I show you
> _exactly_ what I ran and which command line options I used. You do not. I
> provide you all the output. You do not. You can see that someone would only
> be guessing when replying to this email. Also note that you have two dofs
> per cell, so the cell numbers are not the row numbers for the Jacobian.
> Please send something reproducible when you want help on running.
>
> Thanks,
>
> Matt
>
>> Thanks,
>> Guer
>>
>> Sent with Proton Mail <https://proton.me/> secure email.
>>
>> ------- Original Message -------
>> On Monday, October 16th, 2023 at 4:11 PM, Matthew Knepley <
>> knepley at gmail.com> wrote:
>>
>> On Mon, Oct 16, 2023 at 6:54?AM erdemguer <erdemguer at proton.me> wrote:
>>
>>> Hey again.
>>>
>>> This code outputs for example:
>>>
>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 24
>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 27
>>> [0] m: 39 n: 39
>>> [1] m: 42 n: 42
>>>
>>> Shouldn't it be 39 x 81 and 42 x 72 because of the overlapping cells on
>>> processor boundaries?
>>>
>>
>> Here is my output
>>
>> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1
>> -malloc_debug 0 -dm_refine 1
>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 32, cEnd: 32
>> After Distribution Rank: 1, cStart: 0, cEndInterior: 16, cEnd: 24
>> After Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 24
>> [0] m: 48 n: 48
>> [1] m: 48 n: 48
>>
>> The mesh is 4x4 and also split into two triangles, so 32 triangles. Then
>> we split it and have 8 overlap cells on each side. You can get quads using
>>
>> master *:~/Downloads/tmp/Guer$ /PETSc3/petsc/apple/bin/mpiexec -n 2 ./ex1
>> -malloc_debug 0 -dm_plex_simplex 0 -dm_refine 1 -dm_view
>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 16, cEnd: 16
>> After Distribution Rank: 1, cStart: 0, cEndInterior: 8, cEnd: 12
>> After Distribution Rank: 0, cStart: 0, cEndInterior: 8, cEnd: 12
>> [0] m: 24 n: 24
>> [1] m: 24 n: 24
>> It is the same 4x4 mesh, but now with quads.
>>
>> Thanks,
>>
>> Matt
>>
>> P.S. It looks like I should use PetscFV or something like that at the
>>> first place. At first I thought, "I will just use SNES, I will compute only
>>> residual and jacobian on cells so why do bother with PetscFV?" So
>>>
>>> Thanks,
>>> E.
>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>
>>> ------- Original Message -------
>>> On Friday, October 13th, 2023 at 3:00 PM, Matthew Knepley <
>>> knepley at gmail.com> wrote:
>>>
>>> On Fri, Oct 13, 2023 at 7:26?AM erdemguer <erdemguer at proton.me> wrote:
>>>
>>>> Hi, unfortunately it's me again.
>>>>
>>>> I have some weird troubles with creating matrix with DMPlex. Actually I
>>>> might not need to create matrix explicitly, but SNESSolve crashes at there
>>>> too. So, I updated the code you provided. When I tried to use
>>>> DMCreateMatrix() at first, I got an error "Unknown discretization type
>>>> for field 0" at first I applied DMSetLocalSection() and this error is gone.
>>>> But this time when I run the code with multiple processors, sometimes I got
>>>> an output like:
>>>>
>>>
>>> Some setup was out of order so the section size on proc1 was 0, and I
>>> was not good about checking this.
>>> I have fixed it and attached.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: 27, cEnd: 27
>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: 0, cEnd: 0
>>>> [1] ghost cell 14
>>>> [1] ghost cell 15
>>>> [1] ghost cell 16
>>>> [1] ghost cell 17
>>>> [1] ghost cell 18
>>>> [1] ghost cell 19
>>>> [1] ghost cell 20
>>>> [1] ghost cell 21
>>>> [1] ghost cell 22
>>>> After Distribution Rank: 1, cStart: 0, cEndInterior: 14, cEnd: 23
>>>> [0] ghost cell 13
>>>> [0] ghost cell 14
>>>> [0] ghost cell 15
>>>> [0] ghost cell 16
>>>> [0] ghost cell 17
>>>> [0] ghost cell 18
>>>> [0] ghost cell 19
>>>> [0] ghost cell 20
>>>> [0] ghost cell 21
>>>> [0] ghost cell 22
>>>> [0] ghost cell 23
>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: 13, cEnd: 24
>>>> Fatal error in internal_Waitall: Unknown error class, error stack:
>>>> internal_Waitall(82)......................: MPI_Waitall(count=1,
>>>> array_of_requests=0xaaaaf5f72264, array_of_statuses=0x1) failed
>>>> MPIR_Waitall(1099)........................:
>>>> MPIR_Waitall_impl(1011)...................:
>>>> MPIR_Waitall_state(976)...................:
>>>> MPIDI_CH3i_Progress_wait(187).............: an error occurred while
>>>> handling an event returned by MPIDI_CH3I_Sock_Wait()
>>>> MPIDI_CH3I_Progress_handle_sock_event(411):
>>>> ReadMoreData(744).........................: ch3|sock|immedread
>>>> 0xffff8851c5c0 0xaaaaf5e81cd0 0xaaaaf5e8a880
>>>> MPIDI_CH3I_Sock_readv(2553)...............: the supplied buffer
>>>> contains invalid memory (set=0,sock=1,errno=14:Bad address)
>>>>
>>>> Sometimes the error message isn't appearing but for example I'm trying
>>>> to print size of the matrix but it isn't working.
>>>> If necessary, my Configure options --download-mpich --download-hwloc
>>>> --download-pastix --download-hypre --download-ml --download-ctetgen
>>>> --download-triangle --download-exodusii --download-netcdf --download-zlib
>>>> --download-pnetcdf --download-ptscotch --download-hdf5 --with-cc=clang-16
>>>> --with-cxx=clang++-16 COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g
>>>> -O2" --with-debugging=1
>>>>
>>>> Version: Petsc Release Version 3.20.0
>>>>
>>>> Thank you,
>>>> Guer
>>>>
>>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>>
>>>> ------- Original Message -------
>>>> On Thursday, October 12th, 2023 at 12:59 AM, erdemguer <
>>>> erdemguer at proton.me> wrote:
>>>>
>>>> Thank you! That's exactly what I need.
>>>>
>>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>>
>>>> ------- Original Message -------
>>>> On Wednesday, October 11th, 2023 at 4:17 PM, Matthew Knepley <
>>>> knepley at gmail.com> wrote:
>>>>
>>>> On Wed, Oct 11, 2023 at 4:42?AM erdemguer <erdemguer at proton.me> wrote:
>>>>
>>>>> Hi again,
>>>>>
>>>>
>>>> I see the problem. FV ghosts mean extra boundary cells added in FV
>>>> methods using DMPlexCreateGhostCells() in order to impose boundary
>>>> conditions. They are not the "ghost" cells for overlapping parallel
>>>> decompositions. I have changed your code to give you what you want. It is
>>>> attached.
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>>> Here is my code:
>>>>> #include <petsc.h>
>>>>> static char help[] = "dmplex";
>>>>>
>>>>> int main(int argc, char **argv)
>>>>> {
>>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help));
>>>>> DM dm, dm_dist;
>>>>> PetscSection section;
>>>>> PetscInt cStart, cEndInterior, cEnd, rank;
>>>>> PetscInt nc[3] = {3, 3, 3};
>>>>> PetscReal upper[3] = {1, 1, 1};
>>>>>
>>>>> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));
>>>>>
>>>>> DMPlexCreateBoxMesh(PETSC_COMM_WORLD, 3, PETSC_FALSE, nc, NULL, upper,
>>>>> NULL, PETSC_TRUE, &dm);
>>>>> DMViewFromOptions(dm, NULL, "-dm1_view");
>>>>> PetscCall(DMSetFromOptions(dm));
>>>>> DMViewFromOptions(dm, NULL, "-dm2_view");
>>>>>
>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>>>> DMPlexComputeCellTypes(dm);
>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_INTERIOR_GHOST,
>>>>> &cEndInterior, NULL));
>>>>> PetscPrintf(PETSC_COMM_SELF, "Before Distribution Rank: %d, cStart:
>>>>> %d, cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>>>> cEndInterior, cEnd);
>>>>>
>>>>> PetscInt nField = 1, nDof = 3, field = 0;
>>>>> PetscCall(PetscSectionCreate(PETSC_COMM_WORLD, &section));
>>>>> PetscSectionSetNumFields(section, nField);
>>>>> PetscCall(PetscSectionSetChart(section, cStart, cEnd));
>>>>> for (PetscInt p = cStart; p < cEnd; p++)
>>>>> {
>>>>> PetscCall(PetscSectionSetFieldDof(section, p, field, nDof));
>>>>> PetscCall(PetscSectionSetDof(section, p, nDof));
>>>>> }
>>>>>
>>>>> PetscCall(PetscSectionSetUp(section));
>>>>>
>>>>> DMSetLocalSection(dm, section);
>>>>> DMViewFromOptions(dm, NULL, "-dm3_view");
>>>>>
>>>>> DMSetAdjacency(dm, field, PETSC_TRUE, PETSC_TRUE);
>>>>> DMViewFromOptions(dm, NULL, "-dm4_view");
>>>>> PetscCall(DMPlexDistribute(dm, 1, NULL, &dm_dist));
>>>>> if (dm_dist)
>>>>> {
>>>>> DMDestroy(&dm);
>>>>> dm = dm_dist;
>>>>> }
>>>>> DMViewFromOptions(dm, NULL, "-dm5_view");
>>>>> PetscCall(DMPlexGetDepthStratum(dm, 3, &cStart, &cEnd));
>>>>> DMPlexComputeCellTypes(dm);
>>>>> PetscCall(DMPlexGetCellTypeStratum(dm, DM_POLYTOPE_FV_GHOST,
>>>>> &cEndInterior, NULL));
>>>>> PetscPrintf(PETSC_COMM_SELF, "After Distribution Rank: %d, cStart: %d,
>>>>> cEndInterior: %d, cEnd: %d\n", rank, cStart,
>>>>> cEndInterior, cEnd);
>>>>>
>>>>> DMDestroy(&dm);
>>>>> PetscCall(PetscFinalize());
>>>>> }
>>>>>
>>>>> This codes output is currently (on 2 processors) is:
>>>>> Before Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 14
>>>>> Before Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 13
>>>>> After Distribution Rank: 0, cStart: 0, cEndInterior: -1, cEnd: 27
>>>>> After Distribution Rank: 1, cStart: 0, cEndInterior: -1, cEnd: 24
>>>>>
>>>>> DMView outputs:
>>>>> dm1_view (after creation):
>>>>> DM Object: 2 MPI processes
>>>>> type: plex
>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>> Number of 0-cells per rank: 64 0
>>>>> Number of 1-cells per rank: 144 0
>>>>> Number of 2-cells per rank: 108 0
>>>>> Number of 3-cells per rank: 27 0
>>>>> Labels:
>>>>> marker: 1 strata with value/size (1 (218))
>>>>> Face Sets: 6 strata with value/size (6 (9), 5 (9), 3 (9), 4 (9), 1
>>>>> (9), 2 (9))
>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>>>>> celltype: 4 strata with value/size (7 (27), 0 (64), 4 (108), 1 (144))
>>>>>
>>>>> dm2_view (after setfromoptions):
>>>>> DM Object: 2 MPI processes
>>>>> type: plex
>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>> Number of 0-cells per rank: 40 46
>>>>> Number of 1-cells per rank: 83 95
>>>>> Number of 2-cells per rank: 57 64
>>>>> Number of 3-cells per rank: 13 14
>>>>> Labels:
>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>> marker: 1 strata with value/size (1 (109))
>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>>
>>>>> dm3_view (after setting local section):
>>>>> DM Object: 2 MPI processes
>>>>> type: plex
>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>> Number of 0-cells per rank: 40 46
>>>>> Number of 1-cells per rank: 83 95
>>>>> Number of 2-cells per rank: 57 64
>>>>> Number of 3-cells per rank: 13 14
>>>>> Labels:
>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>> marker: 1 strata with value/size (1 (109))
>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>> Field Field_0:
>>>>> adjacency FEM
>>>>>
>>>>> dm4_view (after setting adjacency):
>>>>> DM Object: 2 MPI processes
>>>>> type: plex
>>>>> DM_0x84000004_0 in 3 dimensions:
>>>>> Number of 0-cells per rank: 40 46
>>>>> Number of 1-cells per rank: 83 95
>>>>> Number of 2-cells per rank: 57 64
>>>>> Number of 3-cells per rank: 13 14
>>>>> Labels:
>>>>> depth: 4 strata with value/size (0 (40), 1 (83), 2 (57), 3 (13))
>>>>> marker: 1 strata with value/size (1 (109))
>>>>> Face Sets: 5 strata with value/size (1 (6), 2 (1), 3 (7), 5 (5), 6 (4))
>>>>> celltype: 4 strata with value/size (0 (40), 1 (83), 4 (57), 7 (13))
>>>>> Field Field_0:
>>>>> adjacency FVM++
>>>>>
>>>>> dm5_view (after distribution):
>>>>> DM Object: Parallel Mesh 2 MPI processes
>>>>> type: plex
>>>>> Parallel Mesh in 3 dimensions:
>>>>> Number of 0-cells per rank: 64 60
>>>>> Number of 1-cells per rank: 144 133
>>>>> Number of 2-cells per rank: 108 98
>>>>> Number of 3-cells per rank: 27 24
>>>>> Labels:
>>>>> depth: 4 strata with value/size (0 (64), 1 (144), 2 (108), 3 (27))
>>>>> marker: 1 strata with value/size (1 (218))
>>>>> Face Sets: 6 strata with value/size (1 (9), 2 (9), 3 (9), 4 (9), 5
>>>>> (9), 6 (9))
>>>>> celltype: 4 strata with value/size (0 (64), 1 (144), 4 (108), 7 (27))
>>>>> Field Field_0:
>>>>> adjacency FVM++
>>>>>
>>>>> Thanks,
>>>>> Guer.
>>>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>>>
>>>>> ------- Original Message -------
>>>>> On Wednesday, October 11th, 2023 at 3:33 AM, Matthew Knepley <
>>>>> knepley at gmail.com> wrote:
>>>>>
>>>>> On Tue, Oct 10, 2023 at 7:01?PM erdemguer <erdemguer at proton.me> wrote:
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>> Sorry for my late response. I tried with your suggestions and I think
>>>>>> I made a progress. But I still got issues. Let me explain my latest mesh
>>>>>> routine:
>>>>>>
>>>>>>
>>>>>>    1. DMPlexCreateBoxMesh
>>>>>>    2. DMSetFromOptions
>>>>>>    3. PetscSectionCreate
>>>>>>    4. PetscSectionSetNumFields
>>>>>>    5. PetscSectionSetFieldDof
>>>>>>    6. PetscSectionSetDof
>>>>>>    7. PetscSectionSetUp
>>>>>>    8. DMSetLocalSection
>>>>>>    9. DMSetAdjacency
>>>>>>    10. DMPlexDistribute
>>>>>>
>>>>>>
>>>>>> It's still not working but it's promising, if I call
>>>>>> DMPlexGetDepthStratum for cells, I can see that after distribution
>>>>>> processors have more cells.
>>>>>>
>>>>>
>>>>> Please send the output of DMPlexView() for each incarnation of the
>>>>> mesh. What I do is put
>>>>>
>>>>> DMViewFromOptions(dm, NULL, "-dm1_view")
>>>>>
>>>>>
>>>>> with a different string after each call.
>>>>>
>>>>>> But I couldn't figure out how to decide where the ghost/processor
>>>>>> boundary cells start.
>>>>>>
>>>>>
>>>>> Please send the actual code because the above is not specific enough.
>>>>> For example, you will not have
>>>>> "ghost cells" unless you partition with overlap. This is because by
>>>>> default cells are the partitioned quantity,
>>>>> so each process gets a unique set.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Matt
>>>>>
>>>>>> In older mails I saw there is a function DMPlexGetHybridBounds but I
>>>>>> think that function is deprecated. I tried to use,
>>>>>> DMPlexGetCellTypeStratum as in ts/tutorials/ex11_sa.c but I'm
>>>>>> getting -1 as cEndInterior before and after distribution. I tried it for DM_POLYTOPE_FV_GHOST,
>>>>>> DM_POLYTOPE_INTERIOR_GHOST polytope types. I also tried calling
>>>>>> DMPlexComputeCellTypes before DMPlexGetCellTypeStratum but nothing changed.
>>>>>> I think I can calculate the ghost cell indices using cStart/cEnd before &
>>>>>> after distribution but I think there is a better way I'm currently missing.
>>>>>>
>>>>>> Thanks again,
>>>>>> Guer.
>>>>>>
>>>>>> ------- Original Message -------
>>>>>> On Thursday, September 28th, 2023 at 10:42 PM, Matthew Knepley <
>>>>>> knepley at gmail.com> wrote:
>>>>>>
>>>>>> On Thu, Sep 28, 2023 at 3:38?PM erdemguer via petsc-users <
>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am currently using DMPlex in my code. It runs serially at the
>>>>>>> moment, but I'm interested in adding parallel options. Here is my workflow:
>>>>>>>
>>>>>>> Create a DMPlex mesh from GMSH.
>>>>>>> Reorder it with DMPlexPermute.
>>>>>>> Create necessary pre-processing arrays related to the mesh/problem.
>>>>>>> Create field(s) with multi-dofs.
>>>>>>> Create residual vectors.
>>>>>>> Define a function to calculate the residual for each cell and, use
>>>>>>> SNES.
>>>>>>> As you can see, I'm not using FV or FE structures (most examples
>>>>>>> do). Now, I'm trying to implement this in parallel using a similar
>>>>>>> approach. However, I'm struggling to understand how to create corresponding
>>>>>>> vectors and how to obtain index sets for each processor. Is there a
>>>>>>> tutorial or paper that covers this topic?
>>>>>>>
>>>>>>
>>>>>> The intention was that there is enough information in the manual to
>>>>>> do this.
>>>>>>
>>>>>> Using PetscFE/PetscFV is not required. However, I strongly encourage
>>>>>> you to use PetscSection. Without this, it would be incredibly hard to do
>>>>>> what you want. Once the DM has a Section, it can do things like
>>>>>> automatically create vectors and matrices for you. It can redistribute
>>>>>> them, subset them, etc. The Section describes how dofs are assigned to
>>>>>> pieces of the mesh (mesh points). This is in the manual, and there are a
>>>>>> few examples that do it by hand.
>>>>>>
>>>>>> So I suggest changing your code to use PetscSection, and then letting
>>>>>> us know if things still do not work.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>>> Thank you.
>>>>>>> Guer.
>>>>>>>
>>>>>>> Sent with Proton Mail <https://proton.me/> secure email.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/0d6ab0d4/attachment-0001.html>

From marcos.vanella at nist.gov  Mon Oct 16 13:29:30 2023
From: marcos.vanella at nist.gov (Vanella, Marcos (Fed))
Date: Mon, 16 Oct 2023 18:29:30 +0000
Subject: [petsc-users] Using Sundials from PETSc
Message-ID: <DM6PR09MB5063559D3787172EC73860F5F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>

Hi, we were wondering if it would be possible to call the latest version of Sundials from PETSc?
We are interested in doing chemistry using GPUs and already have interfaces to PETSc from our code.
Thanks,
Marcos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/1a574ee9/attachment.html>

From knepley at gmail.com  Mon Oct 16 14:03:35 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 16 Oct 2023 15:03:35 -0400
Subject: [petsc-users] Using Sundials from PETSc
In-Reply-To: <DM6PR09MB5063559D3787172EC73860F5F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>
References: <DM6PR09MB5063559D3787172EC73860F5F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>
Message-ID: <CAMYG4Gmjx1gXuXumW6LRM51gQihmAxuiQoWw_tY889qeeTyGKQ@mail.gmail.com>

On Mon, Oct 16, 2023 at 2:29?PM Vanella, Marcos (Fed) via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi, we were wondering if it would be possible to call the latest version
> of Sundials from PETSc?
>

The short answer is, no. We are at v2.5 and they are at v6.5. There were no
dates on the version history page, so I do not know how out of date we are.
There have not been any requests for update until now.

We would be happy to get an MR for the updates if you want to try it.


> We are interested in doing chemistry using GPUs and already have
> interfaces to PETSc from our code.
>

How does the GPU interest interact with the SUNDIALS version?

  Thanks,

     Matt


> Thanks,
> Marcos
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/99913c31/attachment.html>

From balay at mcs.anl.gov  Mon Oct 16 14:11:37 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 16 Oct 2023 14:11:37 -0500 (CDT)
Subject: [petsc-users] Using Sundials from PETSc
In-Reply-To: <CAMYG4Gmjx1gXuXumW6LRM51gQihmAxuiQoWw_tY889qeeTyGKQ@mail.gmail.com>
References: <DM6PR09MB5063559D3787172EC73860F5F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>
	<CAMYG4Gmjx1gXuXumW6LRM51gQihmAxuiQoWw_tY889qeeTyGKQ@mail.gmail.com>
Message-ID: <459f4b88-e5da-2123-9fcd-b5ab9653c0b6@mcs.anl.gov>

I'll note - current sundials release has some interfaces to petsc functionality

Satish

On Mon, 16 Oct 2023, Matthew Knepley wrote:

> On Mon, Oct 16, 2023 at 2:29?PM Vanella, Marcos (Fed) via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> 
> > Hi, we were wondering if it would be possible to call the latest version
> > of Sundials from PETSc?
> >
> 
> The short answer is, no. We are at v2.5 and they are at v6.5. There were no
> dates on the version history page, so I do not know how out of date we are.
> There have not been any requests for update until now.
> 
> We would be happy to get an MR for the updates if you want to try it.
> 
> 
> > We are interested in doing chemistry using GPUs and already have
> > interfaces to PETSc from our code.
> >
> 
> How does the GPU interest interact with the SUNDIALS version?
> 
>   Thanks,
> 
>      Matt
> 
> 
> > Thanks,
> > Marcos
> >
> 
> 
> 

From junchao.zhang at gmail.com  Mon Oct 16 14:24:28 2023
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Mon, 16 Oct 2023 14:24:28 -0500
Subject: [petsc-users] [EXTERNAL] Re: Unexpected performance losses
 switching to COO interface
In-Reply-To: <SA1PR09MB80778FD5B589D0714D5A6CCBC6D7A@SA1PR09MB8077.namprd09.prod.outlook.com>
References: <SA1PR09MB80772E375B8C0C24CCE0BD85C6C5A@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp-NBG=CzGEZ-8PVCa3sUiG2erRCKUJG+rP-3cLcN+cVFg@mail.gmail.com>
	<CA+MQGp-tCQhjpVxjqF-zajKGsZNHZFH3v-S-RSdG=ciChu0Oow@mail.gmail.com>
	<CA+MQGp-Srn8rxJsaozPYt+hOge5VJdMuJdwVHB=sxHyCS_KoAA@mail.gmail.com>
	<SA1PR09MB807710D811420766DA7BBFADC6CAA@SA1PR09MB8077.namprd09.prod.outlook.com>
	<CA+MQGp8J9hXjNF__TvfcF97sG9VwzSeis=BChEzvQkGGp3X2Rg@mail.gmail.com>
	<SA1PR09MB8077175B463CA8A2270437BEC6CCA@SA1PR09MB8077.namprd09.prod.outlook.com>
	<SA1PR09MB80778FD5B589D0714D5A6CCBC6D7A@SA1PR09MB8077.namprd09.prod.outlook.com>
Message-ID: <CA+MQGp9=dv6fTKsTHKUkpTSCF9XppTwXOty5B-XM4DsK4L6JAA@mail.gmail.com>

Hi, Philip,
   That branch was merged to petsc/main today. Let me know once you have
new profiling results.

   Thanks.
--Junchao Zhang


On Mon, Oct 16, 2023 at 9:33?AM Fackler, Philip <facklerpw at ornl.gov> wrote:

> Junchao,
>
> I've attached updated timing plots (red and blue are swapped from before;
> yellow is the new one). There is an improvement for the NE_3 case only with
> CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI
> cases, MatShift doesn't show up (I assume because we're using different
> preconditioner arguments). So, there must be some other primary culprit.
> I'll try to get updated profiling data to you soon.
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Fackler, Philip via Xolotl-psi-development <
> xolotl-psi-development at lists.sourceforge.net>
> *Sent:* Wednesday, October 11, 2023 11:31
> *To:* Junchao Zhang <junchao.zhang at gmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Subject:* Re: [Xolotl-psi-development] [EXTERNAL] Re: [petsc-users]
> Unexpected performance losses switching to COO interface
>
> I'm on it.
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Wednesday, October 11, 2023 10:14
> *To:* Fackler, Philip <facklerpw at ornl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; Blondel, Sophie <
> sblondel at utk.edu>
> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses
> switching to COO interface
>
> Hi,  Philip,
>   Could you try this branch
> jczhang/2023-10-05/feature-support-matshift-aijkokkos ?
>
>   Thanks.
> --Junchao Zhang
>
>
> On Thu, Oct 5, 2023 at 4:52?PM Fackler, Philip <facklerpw at ornl.gov> wrote:
>
> Aha! That makes sense. Thank you.
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Thursday, October 5, 2023 17:29
> *To:* Fackler, Philip <facklerpw at ornl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; Blondel, Sophie <
> sblondel at utk.edu>
> *Subject:* [EXTERNAL] Re: [petsc-users] Unexpected performance losses
> switching to COO interface
>
> Wait a moment, it seems it was because we do not have a GPU implementation
> of MatShift...
> Let me see how to add it.
> --Junchao Zhang
>
>
> On Thu, Oct 5, 2023 at 10:58?AM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
> Hi, Philip,
>   I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues()
> instead of the COO interface?  MatSetValues() needs to copy the data from
> device to host and thus is expensive.
>   Do you have profiling results with COO enabled?
>
> [image: Screenshot 2023-10-05 at 10.55.29?AM.png]
>
>
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:52?AM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
> Hi, Philip,
>   I will look into the tarballs and get back to you.
>    Thanks.
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:41?AM Fackler, Philip via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> We finally have xolotl ported to use the new COO interface and the
> aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port
> to our previous version (using MatSetValuesStencil and the default Mat and
> Vec implementations), we expected to see an improvement in performance for
> both the "serial" and "cuda" builds (here I'm referring to the kokkos
> configuration).
>
> Attached are two plots that show timings for three different cases. All of
> these were run on Ascent (the Summit-like training system) with 6 MPI tasks
> (on a single node). The CUDA cases were given one GPU per task (and used
> CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases
> we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent
> as possible.
>
> The performance of RHSJacobian (where the bulk of computation happens in
> xolotl) behaved basically as expected (better than expected in the serial
> build). NE_3 case in CUDA was the only one that performed worse, but not
> surprisingly, since its workload for the GPUs is much smaller. We've still
> got more optimization to do on this.
>
> The real surprise was how much worse the overall solve times were. This
> seems to be due simply to switching to the kokkos-based implementation. I'm
> wondering if there are any changes we can make in configuration or runtime
> arguments to help with PETSc's performance here. Any help looking into this
> would be appreciated.
>
> The tarballs linked here
> <https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA&e=>
> and here
> <https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk&e=>
> are profiling databases which, once extracted, can be viewed with
> hpcviewer. I don't know how helpful that will be, but hopefully it can give
> you some direction.
>
> Thanks for your help,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/8b4bb5d9/attachment.html>

From marcos.vanella at nist.gov  Mon Oct 16 15:07:58 2023
From: marcos.vanella at nist.gov (Vanella, Marcos (Fed))
Date: Mon, 16 Oct 2023 20:07:58 +0000
Subject: [petsc-users] Using Sundials from PETSc
In-Reply-To: <CAMYG4Gmjx1gXuXumW6LRM51gQihmAxuiQoWw_tY889qeeTyGKQ@mail.gmail.com>
References: <DM6PR09MB5063559D3787172EC73860F5F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>
	<CAMYG4Gmjx1gXuXumW6LRM51gQihmAxuiQoWw_tY889qeeTyGKQ@mail.gmail.com>
Message-ID: <DM6PR09MB50637F23B7AD309E4BBCBEC1F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>

Hi Mathew, we have code that time splits the combustion step from the chemical species transport, so on each computational cell for each fluid flow time step, once transport is done we have the mixture chemical composition as initial condition. We are looking into doing finite rate chemistry with skeletal combustion models (20+ equations) in each cell for each fluid time step. Sundials provides the CVODE solver for the time integration of these, and would be interesting to see if we can make use of GPU acceleration. From their User Guide for Version 6.6.0 there are several GPU implementations for building RHS and using linear, nonlinear and stiff ODE solvers.

Thank you Satish for the comment. Might be better at this point to first get an idea on what the implementation in our code using Sundials directly would look like. Then, we can see if it is possible and makes sense to access it through PETSc.
We have things working in CPU making use of and older version of CVODE.

BTW after some changes in our code we are starting running larger cases using GPU accelerated iterative solvers from PETSc, so we have PETSc interfaced already.

Thanks!

________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Monday, October 16, 2023 3:03 PM
To: Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Paul, Chandan (IntlAssoc) <chandan.paul at nist.gov>
Subject: Re: [petsc-users] Using Sundials from PETSc

On Mon, Oct 16, 2023 at 2:29?PM Vanella, Marcos (Fed) via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hi, we were wondering if it would be possible to call the latest version of Sundials from PETSc?

The short answer is, no. We are at v2.5 and they are at v6.5. There were no dates on the version history page, so I do not know how out of date we are. There have not been any requests for update until now.

We would be happy to get an MR for the updates if you want to try it.

We are interested in doing chemistry using GPUs and already have interfaces to PETSc from our code.

How does the GPU interest interact with the SUNDIALS version?

  Thanks,

     Matt

Thanks,
Marcos


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/0e00a677/attachment-0001.html>

From knepley at gmail.com  Mon Oct 16 15:31:14 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 16 Oct 2023 16:31:14 -0400
Subject: [petsc-users] Using Sundials from PETSc
In-Reply-To: <DM6PR09MB50637F23B7AD309E4BBCBEC1F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>
References: <DM6PR09MB5063559D3787172EC73860F5F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>
	<CAMYG4Gmjx1gXuXumW6LRM51gQihmAxuiQoWw_tY889qeeTyGKQ@mail.gmail.com>
	<DM6PR09MB50637F23B7AD309E4BBCBEC1F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>
Message-ID: <CAMYG4Gm2Ja4QMHHU9jD8KVvNoW9rGtN15es2szhGgNX5zaEr0A@mail.gmail.com>

On Mon, Oct 16, 2023 at 4:08?PM Vanella, Marcos (Fed) <
marcos.vanella at nist.gov> wrote:

> Hi Mathew, we have code that time splits the combustion step from the
> chemical species transport, so on each computational cell for each fluid
> flow time step, once transport is done we have the mixture chemical
> composition as initial condition. We are looking into doing finite rate
> chemistry with skeletal combustion models (20+ equations) in each cell for
> each fluid time step. Sundials provides the CVODE solver for the time
> integration of these, and would be interesting to see if we can make use of
> GPU acceleration. From their User Guide for Version 6.6.0 there are several
> GPU implementations for building RHS and using linear, nonlinear and stiff
> ODE solvers.
>

We are doing a similar thing in CHREST (https://www.buffalo.edu/chrest.html).
Since we normally use hundreds of species and thousands of reactions for
the reduced mechanism, we are using TChem2 to build and solve the system in
each cell.

Since these systems are so small, you are likely to need some way of
batching them within a warp. Do you have an idea for this already?

  Thanks,

     Matt


> Thank you Satish for the comment. Might be better at this point to first
> get an idea on what the implementation in our code using Sundials directly
> would look like. Then, we can see if it is possible and makes sense to
> access it through PETSc.
> We have things working in CPU making use of and older version of CVODE.
>
> BTW after some changes in our code we are starting running larger cases
> using GPU accelerated iterative solvers from PETSc, so we have PETSc
> interfaced already.
>
> Thanks!
>
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* Monday, October 16, 2023 3:03 PM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Paul, Chandan
> (IntlAssoc) <chandan.paul at nist.gov>
> *Subject:* Re: [petsc-users] Using Sundials from PETSc
>
> On Mon, Oct 16, 2023 at 2:29?PM Vanella, Marcos (Fed) via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hi, we were wondering if it would be possible to call the latest version
> of Sundials from PETSc?
>
>
> The short answer is, no. We are at v2.5 and they are at v6.5. There were
> no dates on the version history page, so I do not know how out of date we
> are. There have not been any requests for update until now.
>
> We would be happy to get an MR for the updates if you want to try it.
>
>
> We are interested in doing chemistry using GPUs and already have
> interfaces to PETSc from our code.
>
>
> How does the GPU interest interact with the SUNDIALS version?
>
>   Thanks,
>
>      Matt
>
>
> Thanks,
> Marcos
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/382304f9/attachment.html>

From marcos.vanella at nist.gov  Mon Oct 16 16:15:26 2023
From: marcos.vanella at nist.gov (Vanella, Marcos (Fed))
Date: Mon, 16 Oct 2023 21:15:26 +0000
Subject: [petsc-users] Using Sundials from PETSc
In-Reply-To: <CAMYG4Gm2Ja4QMHHU9jD8KVvNoW9rGtN15es2szhGgNX5zaEr0A@mail.gmail.com>
References: <DM6PR09MB5063559D3787172EC73860F5F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>
	<CAMYG4Gmjx1gXuXumW6LRM51gQihmAxuiQoWw_tY889qeeTyGKQ@mail.gmail.com>
	<DM6PR09MB50637F23B7AD309E4BBCBEC1F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>
	<CAMYG4Gm2Ja4QMHHU9jD8KVvNoW9rGtN15es2szhGgNX5zaEr0A@mail.gmail.com>
Message-ID: <DM6PR09MB5063E9872D2252211941ACA5F8D7A@DM6PR09MB5063.namprd09.prod.outlook.com>

Hi Matt, very interesting project you are working on. We haven't gone deep on how we would do this in GPUs and are starting to look at options. We will explore if it is possible to batch work needed for several cells within a thread group on the gpu.

We use a single Cartesian mesh per MPI process (usually with 40^3 to 50^3 cells). Something I implemented to avoid the MPI process over-subscription of GPU with PETSc solvers was to cluster several MPI Processes per GPU on resource sets. Then, the processes in the set would pass matrix (at setup) and RHS to a single process (set master) which communicates with the GPU.
The GPU solution is then brought back to the set master which distributes it to the MPI processes in the set as needed.
So, only a set of processes as large as the number of GPUs in the calculation (with their own MPI communicator) call the PETSc matrix and vector building, and solve routines.  The neat thing is that all MPI communications are local to the node. This idea is not new, it was developed by the researchers at GWU that interfaced PETSc to AMGx back when there were no native GPU solvers in PETSc, HYPRE and other libs (~2016).

Best,
Marcos

________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Monday, October 16, 2023 4:31 PM
To: Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Paul, Chandan (IntlAssoc) <chandan.paul at nist.gov>
Subject: Re: [petsc-users] Using Sundials from PETSc

On Mon, Oct 16, 2023 at 4:08?PM Vanella, Marcos (Fed) <marcos.vanella at nist.gov<mailto:marcos.vanella at nist.gov>> wrote:
Hi Mathew, we have code that time splits the combustion step from the chemical species transport, so on each computational cell for each fluid flow time step, once transport is done we have the mixture chemical composition as initial condition. We are looking into doing finite rate chemistry with skeletal combustion models (20+ equations) in each cell for each fluid time step. Sundials provides the CVODE solver for the time integration of these, and would be interesting to see if we can make use of GPU acceleration. From their User Guide for Version 6.6.0 there are several GPU implementations for building RHS and using linear, nonlinear and stiff ODE solvers.

We are doing a similar thing in CHREST (https://www.buffalo.edu/chrest.html). Since we normally use hundreds of species and thousands of reactions for the reduced mechanism, we are using TChem2 to build and solve the system in each cell.

Since these systems are so small, you are likely to need some way of batching them within a warp. Do you have an idea for this already?

  Thanks,

     Matt

Thank you Satish for the comment. Might be better at this point to first get an idea on what the implementation in our code using Sundials directly would look like. Then, we can see if it is possible and makes sense to access it through PETSc.
We have things working in CPU making use of and older version of CVODE.

BTW after some changes in our code we are starting running larger cases using GPU accelerated iterative solvers from PETSc, so we have PETSc interfaced already.

Thanks!

________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: Monday, October 16, 2023 3:03 PM
To: Vanella, Marcos (Fed) <marcos.vanella at nist.gov<mailto:marcos.vanella at nist.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Paul, Chandan (IntlAssoc) <chandan.paul at nist.gov<mailto:chandan.paul at nist.gov>>
Subject: Re: [petsc-users] Using Sundials from PETSc

On Mon, Oct 16, 2023 at 2:29?PM Vanella, Marcos (Fed) via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hi, we were wondering if it would be possible to call the latest version of Sundials from PETSc?

The short answer is, no. We are at v2.5 and they are at v6.5. There were no dates on the version history page, so I do not know how out of date we are. There have not been any requests for update until now.

We would be happy to get an MR for the updates if you want to try it.

We are interested in doing chemistry using GPUs and already have interfaces to PETSc from our code.

How does the GPU interest interact with the SUNDIALS version?

  Thanks,

     Matt

Thanks,
Marcos


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231016/fac9afdb/attachment-0001.html>

From jroman at dsic.upv.es  Tue Oct 17 13:31:15 2023
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Tue, 17 Oct 2023 20:31:15 +0200
Subject: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and
 T'(lambda)
In-Reply-To: <BL0PR05MB4801499DCBCD2582091CCBE1A2D3A@BL0PR05MB4801.namprd05.prod.outlook.com>
References: <BL0PR05MB480177D5E10088FE9EC11DA4A2CAA@BL0PR05MB4801.namprd05.prod.outlook.com>
	<F1528F80-C92C-43A9-9CB7-75A1D8712100@dsic.upv.es>
	<BL0PR05MB4801BD698F33E63C93E55252A2C9A@BL0PR05MB4801.namprd05.prod.outlook.com>
	<89E53665-4C0D-4583-9C90-13C4C108A4EA@dsic.upv.es>
	<BL0PR05MB48010E7BE4124F38F73144AEA2CCA@BL0PR05MB4801.namprd05.prod.outlook.com>
	<442B3841-B668-4185-9C6F-D03CA481CA26@dsic.upv.es>
	<BL0PR05MB4801499DCBCD2582091CCBE1A2D3A@BL0PR05MB4801.namprd05.prod.outlook.com>
Message-ID: <D1209BDF-F37A-477E-8270-53D4D58F4A17@dsic.upv.es>

Kenneth,

I have worked a bit more on your example and put it in SLEPc https://gitlab.com/slepc/slepc/-/merge_requests/596
This version also has MATOP_DESTROY to avoid memory leaks.

Thanks.
Jose


> El 12 oct 2023, a las 20:59, Kenneth C Hall <kenneth.c.hall at duke.edu> escribi?:
> 
> Jose,
>  
> Thanks very much for this. I will give it a try and let you know how it works.
>  
> Best regards,
> Kenneth
>  
> From: Jose E. Roman <jroman at dsic.upv.es>
> Date: Thursday, October 12, 2023 at 2:12 PM
> To: Kenneth C Hall <kenneth.c.hall at duke.edu>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] SLEPc/NEP for shell matrice T(lambda) and T'(lambda)
> 
> I am attaching your example modified with the context stuff.
> 
> With the PETSc branch that I indicated, now it works with NLEIGS, for instance:
> 
> $ ./test_nep -nep_nleigs_ksp_type gmres -nep_nleigs_pc_type none -rg_interval_endpoints 0.2,1.1 -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_view -nep_error_relative ::ascii_info_detail
> 
> And also other solvers such as SLP:
> 
> $ ./test_nep -nep_type slp -nep_slp_ksp_type gmres -nep_slp_pc_type none -nep_target 0.8 -nep_nev 5 -n 400 -nep_monitor -nep_error_relative ::ascii_info_detail
> 
> I will clean the example code an add it as a SLEPc example.
> 
> Regards,
> Jose
> 
> 
> > El 11 oct 2023, a las 17:27, Kenneth C Hall <kenneth.c.hall at duke.edu> escribi?:
> > 
> > Jose,
> >  
> > Thanks very much for your help with this. Greatly appreciated. I will look at the MR. Please let me know if you do get the Fortran example working.
> >  
> > Thanks, and best regards,
> > Kenneth
> > 


From degregori at dkrz.de  Wed Oct 18 04:54:43 2023
From: degregori at dkrz.de (Enrico)
Date: Wed, 18 Oct 2023 11:54:43 +0200
Subject: [petsc-users] Coordinate format internal reordering
Message-ID: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>

Hello,

I'm trying to use Petsc to solve a linear system in an application. I'm 
using the coordinate format to define the matrix and the vector (it 
should work better on GPU but at the moment every test is on CPU). After 
the call to VecSetValuesCOO, I've noticed that the vector is storing the 
data in a different way from my application. For example with two 
processes in the application

process 0 owns cells 2, 3, 4

process 1 owns cells 0, 1, 5

But in the vector data structure of Petsc

process 0 owns cells 0, 1, 2

process 1 owns cells 3, 4, 5

This is in principle not a big issue, but after solving the linear 
system I get the solution vector x and I want to get the values in the 
correct processes. Is there a way to get vector values from other 
processes or to get a mapping so that I can do it myself?

Cheers,
Enrico Degregori

From yc17470 at connect.um.edu.mo  Wed Oct 18 05:06:38 2023
From: yc17470 at connect.um.edu.mo (Gong Yujie)
Date: Wed, 18 Oct 2023 10:06:38 +0000
Subject: [petsc-users] Error when installing PETSc
Message-ID: <OSZP286MB1061E8AE2C2B87DFAC1D775EEBD5A@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>

Dear PETSc developers,

I got an error message when installing PETSc with a clang compiler. Could you please help me find the problem? The configure.log is attached.

Best Regards,
Yujie

Here is the detail of the error:

=============================================================================================
                      Configuring PETSc to compile on your system
=============================================================================================
=============================================================================================
***** WARNING: Using default optimization C flags -g -O3
You might consider manually setting optimal optimization flags for your system with
COPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for examples
=============================================================================================         =============================================================================================
***** WARNING: Using default Cxx optimization flags -g -O3
You might consider manually setting optimal optimization flags for your system with
CXXOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for examples
=============================================================================================         =============================================================================================
***** WARNING: Using default FORTRAN optimization flags -O
You might consider manually setting optimal optimization flags for your system with
FOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for examples
=============================================================================================         =============================================================================================
Trying to download https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz for
=============================================================================================         =============================================================================================
Running configure on OPENMPI; this may take several minutes
=============================================================================================         =============================================================================================
Running make on OPENMPI; this may take several minutes
=============================================================================================
*******************************************************************************
         UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for details):
-------------------------------------------------------------------------------
Error running make; make install on OPENMPI
*******************************************************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231018/0b70fb4a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 1016438 bytes
Desc: configure.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231018/0b70fb4a/attachment-0001.obj>

From knepley at gmail.com  Wed Oct 18 06:39:29 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 18 Oct 2023 07:39:29 -0400
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
Message-ID: <CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>

On Wed, Oct 18, 2023 at 5:55?AM Enrico <degregori at dkrz.de> wrote:

> Hello,
>
> I'm trying to use Petsc to solve a linear system in an application. I'm
> using the coordinate format to define the matrix and the vector (it
> should work better on GPU but at the moment every test is on CPU). After
> the call to VecSetValuesCOO, I've noticed that the vector is storing the
> data in a different way from my application. For example with two
> processes in the application
>
> process 0 owns cells 2, 3, 4
>
> process 1 owns cells 0, 1, 5
>
> But in the vector data structure of Petsc
>
> process 0 owns cells 0, 1, 2
>
> process 1 owns cells 3, 4, 5
>
> This is in principle not a big issue, but after solving the linear
> system I get the solution vector x and I want to get the values in the
> correct processes. Is there a way to get vector values from other
> processes or to get a mapping so that I can do it myself?
>

By definition, PETSc vectors and matrices own contiguous row blocks. If you
want to have another,
global ordering, we support that with https://petsc.org/main/manualpages/AO/

  Thanks,

     Matt


> Cheers,
> Enrico Degregori
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231018/33b58c82/attachment.html>

From knepley at gmail.com  Wed Oct 18 06:41:27 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 18 Oct 2023 07:41:27 -0400
Subject: [petsc-users] Error when installing PETSc
In-Reply-To: <OSZP286MB1061E8AE2C2B87DFAC1D775EEBD5A@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
References: <OSZP286MB1061E8AE2C2B87DFAC1D775EEBD5A@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
Message-ID: <CAMYG4GnGRY=wnA7MUgWCFgQD7qTdeMLgEHZSCQiMvjLF0C8OUw@mail.gmail.com>

On Wed, Oct 18, 2023 at 6:07?AM Gong Yujie <yc17470 at connect.um.edu.mo>
wrote:

> Dear PETSc developers,
>
> I got an error message when installing PETSc with a clang compiler. Could
> you please help me find the problem? The configure.log is attached.
>

Your compiler segfaulted when compiling OpenMPI:

Making all in mca/crs
make[2]: Entering directory
'/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal/mca/crs'
  GENERATE opal_crs.7
  CC       base/crs_base_open.lo
  CC       base/crs_base_close.lo
  CC       base/crs_base_select.lo
  CC       base/crs_base_fns.lo
make[2]: Leaving directory
'/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal/mca/crs'
make[1]: Leaving directory
'/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal'/bin/sh:
line 7:  6327 Illegal instruction     (core dumped) ../../../config/
make_manpage.pl --package-name='Open MPI' --package-version='4.1.0'
--ompi-date='Dec 18, 2020' --opal-date='Dec 18, 2020' --orte-date='Dec 18,
2020' --input=opal_crs.7in --output=opal_crs.7
make[2]: *** [Makefile:2215: opal_crs.7] Error 132
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:2383: all-recursive] Error 1
make: *** [Makefile:1901: all-recursive] Error 1

I suggest compiling MPICH instead.

  Thanks,

     Matt


> Best Regards,
> Yujie
>
> Here is the detail of the error:
>
> =============================================================================================
>
>                       Configuring PETSc to compile on your system
>
> =============================================================================================
> =============================================================================================
>
> ***** WARNING: Using default optimization C flags -g -O3
> You might consider manually setting optimal optimization flags for your
> system with
> COPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for
> examples
> =============================================================================================
>
> =============================================================================================
>
> ***** WARNING: Using default Cxx optimization flags -g -O3
>
> You might consider manually setting optimal optimization flags for your
> system with
> CXXOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for
> examples
> =============================================================================================
>
> =============================================================================================
>
> ***** WARNING: Using default FORTRAN optimization flags -O
>
> You might consider manually setting optimal optimization flags for your
> system with
> FOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for
> examples
> =============================================================================================
>
> =============================================================================================
>
> Trying to download
> https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz
> for
> =============================================================================================
>
> =============================================================================================
>
> Running configure on OPENMPI; this may take several minutes
>
> =============================================================================================
>
> =============================================================================================
>
> Running make on OPENMPI; this may take several minutes
>
> =============================================================================================
>
>
> *******************************************************************************
>          UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for
> details):
>
> -------------------------------------------------------------------------------
> Error running make; make install on OPENMPI
>
> *******************************************************************************
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231018/a915d298/attachment-0001.html>

From mfadams at lbl.gov  Wed Oct 18 07:15:36 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Wed, 18 Oct 2023 08:15:36 -0400
Subject: [petsc-users] About recent changes in GAMG
In-Reply-To: <SN6PR01MB44935778BE5E44D22C26F495A7D5A@SN6PR01MB4493.prod.exchangelabs.com>
References: <SN6PR01MB44935778BE5E44D22C26F495A7D5A@SN6PR01MB4493.prod.exchangelabs.com>
Message-ID: <CADOhEh5q3Rwb4bumjUXxwEA4-dkS6L4u5shB7ZEAT7AcqAKc+g@mail.gmail.com>

Hi Jeremy,

I hope you don't mind putting this on the list (w/o data), but this is
documentation and you are the second user that found regressions.
Sorry for the churn.

There is a lot here so we can iterate, but here is a pass at your questions.

*** Using MIS-2 instead of square graph was motivated by setup
cost/performance but on GPUs with some recent fixes in Kokkos (in a branch)
square graph seems OK.
My experience was that square graph is better in terms of quality and we
have a power user, like you all, that found this also.
So I switched the default back to square graph.

Interesting that you found that MIS-2 (new method) could be faster, but it
might be because the two methods coarsen at different rates and that can
make a big difference.
(the way to test would be to adjust parameters to get similar coarsen
rates, but I digress)
It's hard to understand the differences between these two methods in terms
of aggregate quality so we need to just experiment and have options.

*** As far as your thermal problem. There was a complaint that the eigen
estimates for chebyshev smoother were not recomputed for nonlinear problems
and I added an option to do that and turned it on by default:
Use '-pc_gamg_recompute_esteig false' to get back to the original.
(I should have turned it off by default)

Now, if your problem is symmetric and you use CG to compute the eigen
estimates there should be no difference.
If you use CG to compute the eigen estimates in GAMG (and have GAMG give
them to cheby, the default) that when you recompute the eigen estimates the
cheby eigen estimator is used and that will use gmres by default unless you
set the SPD property in your matrix.
So if you set '-pc_gamg_esteig_ksp_type cg' you want to also set
'-mg_levels_esteig_ksp_type cg' (verify with -ksp_view and -options_left)
CG is a much better estimator for SPD.

And I found that the cheby eigen estimator uses an LAPACK *eigen* method to
compute the eigen bounds and GAMG uses a *singular value* method.
The two give very different results on the lid driven cavity test (ex19).
eigen is lower, which is safer but not optimal if it is too low.
I have a branch to have cheby use the singular value method, but I don't
plan on merging it (enough churn and I don't understand these differences).

*** '-pc_gamg_low_memory_threshold_filter false' recovers the old filtering
method.
This is the default now because there is a bug in the (new) low memory
filter.
This bug is very rare and catastrophic.
We are working on it and will turn it on by default when it's fixed.
This does not affect the semantics of the solver, just work and memory
complexity.

*** As far as tet4 vs tet10, I would guess that tet4 wants more
aggressive coarsening.
The default is to do aggressive on one (1) level.
You might want more levels for tet4.
And the new MIS-k coarsening can use any k (default is 2) wth
'-mat_coarsen_misk_distance k' (eg, k=3)
I have not added hooks to have a more complex schedule to specify the
method on each level.

Thanks,
Mark

On Tue, Oct 17, 2023 at 9:33?PM Jeremy Theler (External) <
jeremy.theler-ext at ansys.com> wrote:

> Hey Mark
>
> Regarding the changes in the coarsening algorithm in 3.20 with respect to
> 3.19 in general we see that for some problems the MIS strategy gives and
> overall performance which is slightly better and for some others it is
> slightly worse than the "baseline" from 3.19.
> We also saw that current main has switched back to the old square
> coarsening algorithm by default, which again, in some cases is better and
> in others is worse than 3.19 without any extra command-line option.
>
> Now what seems weird to us is that we have a test case which is a heat
> conduction problem with radiation boundary conditions (so it is non linear)
> using tet10 and we see
>
>    1. that in parallel v3.20 is way worse than v3.19, although the memory
>    usage is similar
>    2. that petsc main (with no extra flags, just the defaults) recover
>    the 3.19 performance but memory usage is significantly larger
>
>
> I tried using the -pc_gamg_low_memory_threshold_filter flag and the
> results were the same.
>
> Find attached the log and snes views of 3.19, 3.20 and main using 4 MPI
> ranks.
> Is there any explanation about these two points we are seeing?
> Another weird finding is that if we use tet4 instead of tet10, v3.20 is
> only 10% slower than the other two and main does not need more memory than
> the other two.
>
> BTW, I have dozens of other log view outputs comparing 3.19, 3.20 and main
> should you be interested.
>
> Let me know if it is better to move this discussion into the PETSc mailing
> list.
>
> Regards,
> jeremy theler
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231018/435eacc4/attachment.html>

From balay at mcs.anl.gov  Wed Oct 18 09:33:33 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 18 Oct 2023 09:33:33 -0500 (CDT)
Subject: [petsc-users] Error when installing PETSc
In-Reply-To: <CAMYG4GnGRY=wnA7MUgWCFgQD7qTdeMLgEHZSCQiMvjLF0C8OUw@mail.gmail.com>
References: <OSZP286MB1061E8AE2C2B87DFAC1D775EEBD5A@OSZP286MB1061.JPNP286.PROD.OUTLOOK.COM>
	<CAMYG4GnGRY=wnA7MUgWCFgQD7qTdeMLgEHZSCQiMvjLF0C8OUw@mail.gmail.com>
Message-ID: <38ad7e4e-1dc8-6dbf-c34d-1bbc6b8180fa@mcs.anl.gov>

> Working directory: /home/tt/petsc-3.16.0

use latest petsc release - 3.20

> --with-fc=flang

I don't think this ever worked. Use --with-fc=gfortran instead

/opt/ohpc/pub/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.3.0/m4-1.4.19-lwqcw3hzoxoia5q6nzolylxaf5zevluk/bin/m4: internal error detected; please report this bug to <bug-m4 at gnu.org>: Illegal instruction

You might need to report this to your admin who installed this spack package.

They might need to rebuild spack for 'x86_64' instead of 'skylake_avx512'

Or use a different m4 - say from /usr/bin - if you have it there.

Satish

On Wed, 18 Oct 2023, Matthew Knepley wrote:

> On Wed, Oct 18, 2023 at 6:07?AM Gong Yujie <yc17470 at connect.um.edu.mo>
> wrote:
> 
> > Dear PETSc developers,
> >
> > I got an error message when installing PETSc with a clang compiler. Could
> > you please help me find the problem? The configure.log is attached.
> >
> 
> Your compiler segfaulted when compiling OpenMPI:
> 
> Making all in mca/crs
> make[2]: Entering directory
> '/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal/mca/crs'
>   GENERATE opal_crs.7
>   CC       base/crs_base_open.lo
>   CC       base/crs_base_close.lo
>   CC       base/crs_base_select.lo
>   CC       base/crs_base_fns.lo
> make[2]: Leaving directory
> '/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal/mca/crs'
> make[1]: Leaving directory
> '/home/tt/petsc-3.16.0/optamd/externalpackages/openmpi-4.1.0/opal'/bin/sh:
> line 7:  6327 Illegal instruction     (core dumped) ../../../config/
> make_manpage.pl --package-name='Open MPI' --package-version='4.1.0'
> --ompi-date='Dec 18, 2020' --opal-date='Dec 18, 2020' --orte-date='Dec 18,
> 2020' --input=opal_crs.7in --output=opal_crs.7
> make[2]: *** [Makefile:2215: opal_crs.7] Error 132
> make[2]: *** Waiting for unfinished jobs....
> make[1]: *** [Makefile:2383: all-recursive] Error 1
> make: *** [Makefile:1901: all-recursive] Error 1
> 
> I suggest compiling MPICH instead.
> 
>   Thanks,
> 
>      Matt
> 
> 
> > Best Regards,
> > Yujie
> >
> > Here is the detail of the error:
> >
> > =============================================================================================
> >
> >                       Configuring PETSc to compile on your system
> >
> > =============================================================================================
> > =============================================================================================
> >
> > ***** WARNING: Using default optimization C flags -g -O3
> > You might consider manually setting optimal optimization flags for your
> > system with
> > COPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for
> > examples
> > =============================================================================================
> >
> > =============================================================================================
> >
> > ***** WARNING: Using default Cxx optimization flags -g -O3
> >
> > You might consider manually setting optimal optimization flags for your
> > system with
> > CXXOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for
> > examples
> > =============================================================================================
> >
> > =============================================================================================
> >
> > ***** WARNING: Using default FORTRAN optimization flags -O
> >
> > You might consider manually setting optimal optimization flags for your
> > system with
> > FOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for
> > examples
> > =============================================================================================
> >
> > =============================================================================================
> >
> > Trying to download
> > https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz
> > for
> > =============================================================================================
> >
> > =============================================================================================
> >
> > Running configure on OPENMPI; this may take several minutes
> >
> > =============================================================================================
> >
> > =============================================================================================
> >
> > Running make on OPENMPI; this may take several minutes
> >
> > =============================================================================================
> >
> >
> > *******************************************************************************
> >          UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for
> > details):
> >
> > -------------------------------------------------------------------------------
> > Error running make; make install on OPENMPI
> >
> > *******************************************************************************
> >
> >
> 
> 

From degregori at dkrz.de  Thu Oct 19 05:51:39 2023
From: degregori at dkrz.de (Enrico)
Date: Thu, 19 Oct 2023 12:51:39 +0200
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
Message-ID: <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>

Hello,

if I create an application ordering using AOCreateBasic, should I 
provide the same array for const PetscInt myapp[] and const PetscInt 
mypetsc[] in order to get the same ordering of the application within PETSC?

And once I define the ordering so that the local vector and matrix are 
defined in PETSC as in my application, how can I use it to create the 
actual vector and matrix?

Thanks in advance for the help.

Cheers,
Enrico

On 18/10/2023 13:39, Matthew Knepley wrote:
> On Wed, Oct 18, 2023 at 5:55?AM Enrico <degregori at dkrz.de 
> <mailto:degregori at dkrz.de>> wrote:
> 
>     Hello,
> 
>     I'm trying to use Petsc to solve a linear system in an application. I'm
>     using the coordinate format to define the matrix and the vector (it
>     should work better on GPU but at the moment every test is on CPU).
>     After
>     the call to VecSetValuesCOO, I've noticed that the vector is storing
>     the
>     data in a different way from my application. For example with two
>     processes in the application
> 
>     process 0 owns cells 2, 3, 4
> 
>     process 1 owns cells 0, 1, 5
> 
>     But in the vector data structure of Petsc
> 
>     process 0 owns cells 0, 1, 2
> 
>     process 1 owns cells 3, 4, 5
> 
>     This is in principle not a big issue, but after solving the linear
>     system I get the solution vector x and I want to get the values in the
>     correct processes. Is there a way to get vector values from other
>     processes or to get a mapping so that I can do it myself?
> 
> 
> By definition, PETSc vectors and matrices own contiguous row blocks. If 
> you want to have another,
> global ordering, we support that with 
> https://petsc.org/main/manualpages/AO/ 
> <https://petsc.org/main/manualpages/AO/>
> 
>  ? Thanks,
> 
>  ? ? ?Matt
> 
>     Cheers,
>     Enrico Degregori
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

From knepley at gmail.com  Thu Oct 19 07:50:46 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 19 Oct 2023 08:50:46 -0400
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
Message-ID: <CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>

On Thu, Oct 19, 2023 at 6:51?AM Enrico <degregori at dkrz.de> wrote:

> Hello,
>
> if I create an application ordering using AOCreateBasic, should I
> provide the same array for const PetscInt myapp[] and const PetscInt
> mypetsc[] in order to get the same ordering of the application within
> PETSC?
>

Are you asking if the identity permutation can be constructed using the
same array twice? Yes.


> And once I define the ordering so that the local vector and matrix are
> defined in PETSC as in my application, how can I use it to create the
> actual vector and matrix?
>

The vectors and matrices do not change. The AO is a permutation. You can
use it to permute
a vector into another order, or to convert on index to another.

  Thanks,

      Matt


> Thanks in advance for the help.
>
> Cheers,
> Enrico
>
> On 18/10/2023 13:39, Matthew Knepley wrote:
> > On Wed, Oct 18, 2023 at 5:55?AM Enrico <degregori at dkrz.de
> > <mailto:degregori at dkrz.de>> wrote:
> >
> >     Hello,
> >
> >     I'm trying to use Petsc to solve a linear system in an application.
> I'm
> >     using the coordinate format to define the matrix and the vector (it
> >     should work better on GPU but at the moment every test is on CPU).
> >     After
> >     the call to VecSetValuesCOO, I've noticed that the vector is storing
> >     the
> >     data in a different way from my application. For example with two
> >     processes in the application
> >
> >     process 0 owns cells 2, 3, 4
> >
> >     process 1 owns cells 0, 1, 5
> >
> >     But in the vector data structure of Petsc
> >
> >     process 0 owns cells 0, 1, 2
> >
> >     process 1 owns cells 3, 4, 5
> >
> >     This is in principle not a big issue, but after solving the linear
> >     system I get the solution vector x and I want to get the values in
> the
> >     correct processes. Is there a way to get vector values from other
> >     processes or to get a mapping so that I can do it myself?
> >
> >
> > By definition, PETSc vectors and matrices own contiguous row blocks. If
> > you want to have another,
> > global ordering, we support that with
> > https://petsc.org/main/manualpages/AO/
> > <https://petsc.org/main/manualpages/AO/>
> >
> >    Thanks,
> >
> >       Matt
> >
> >     Cheers,
> >     Enrico Degregori
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/ <
> http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231019/40012622/attachment.html>

From degregori at dkrz.de  Thu Oct 19 07:57:39 2023
From: degregori at dkrz.de (Enrico)
Date: Thu, 19 Oct 2023 14:57:39 +0200
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
Message-ID: <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>

Maybe I wasn't clear enough. I would like to completely get rid of Petsc 
ordering because I don't want extra communication between processes to 
construct the vector and the matrix (since I have to fill them every 
time step because I'm just using the linear solver with a Mat and a Vec 
data structure). I don't understand how I can do that.

My initial idea was to create another global index ordering within my 
application to use only for the Petsc interface but then I think that 
the ghost cells are wrong.

On 19/10/2023 14:50, Matthew Knepley wrote:
> On Thu, Oct 19, 2023 at 6:51?AM Enrico <degregori at dkrz.de 
> <mailto:degregori at dkrz.de>> wrote:
> 
>     Hello,
> 
>     if I create an application ordering using AOCreateBasic, should I
>     provide the same array for const PetscInt myapp[] and const PetscInt
>     mypetsc[] in order to get the same ordering of the application
>     within PETSC?
> 
> 
> Are you asking if the identity permutation can be constructed using the 
> same array twice? Yes.
> 
>     And once I define the ordering so that the local vector and matrix are
>     defined in PETSC as in my application, how can I use it to create the
>     actual vector and matrix?
> 
> 
> The vectors and matrices do not change. The AO is a permutation. You can 
> use it to permute
> a vector into another order, or to convert on index to another.
> 
>  ? Thanks,
> 
>  ? ? ? Matt
> 
>     Thanks in advance for the help.
> 
>     Cheers,
>     Enrico
> 
>     On 18/10/2023 13:39, Matthew Knepley wrote:
>      > On Wed, Oct 18, 2023 at 5:55?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
>      >
>      >? ? ?Hello,
>      >
>      >? ? ?I'm trying to use Petsc to solve a linear system in an
>     application. I'm
>      >? ? ?using the coordinate format to define the matrix and the
>     vector (it
>      >? ? ?should work better on GPU but at the moment every test is on
>     CPU).
>      >? ? ?After
>      >? ? ?the call to VecSetValuesCOO, I've noticed that the vector is
>     storing
>      >? ? ?the
>      >? ? ?data in a different way from my application. For example with two
>      >? ? ?processes in the application
>      >
>      >? ? ?process 0 owns cells 2, 3, 4
>      >
>      >? ? ?process 1 owns cells 0, 1, 5
>      >
>      >? ? ?But in the vector data structure of Petsc
>      >
>      >? ? ?process 0 owns cells 0, 1, 2
>      >
>      >? ? ?process 1 owns cells 3, 4, 5
>      >
>      >? ? ?This is in principle not a big issue, but after solving the
>     linear
>      >? ? ?system I get the solution vector x and I want to get the
>     values in the
>      >? ? ?correct processes. Is there a way to get vector values from other
>      >? ? ?processes or to get a mapping so that I can do it myself?
>      >
>      >
>      > By definition, PETSc vectors and matrices own contiguous row
>     blocks. If
>      > you want to have another,
>      > global ordering, we support that with
>      > https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      > <https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >
>      >? ? Thanks,
>      >
>      >? ? ? ?Matt
>      >
>      >? ? ?Cheers,
>      >? ? ?Enrico Degregori
>      >
>      >
>      >
>      > --
>      > What most experimenters take for granted before they begin their
>      > experiments is infinitely more interesting than any results to which
>      > their experiments lead.
>      > -- Norbert Wiener
>      >
>      > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>     <http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

From knepley at gmail.com  Thu Oct 19 08:25:40 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 19 Oct 2023 09:25:40 -0400
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
	<1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
Message-ID: <CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>

On Thu, Oct 19, 2023 at 8:57?AM Enrico <degregori at dkrz.de> wrote:

> Maybe I wasn't clear enough. I would like to completely get rid of Petsc
> ordering because I don't want extra communication between processes to
> construct the vector and the matrix (since I have to fill them every
> time step because I'm just using the linear solver with a Mat and a Vec
> data structure). I don't understand how I can do that.
>

Any program you write to do linear algebra will have contiguous storage
because it
is so much faster. Contiguous indexing makes sense for contiguous storage.
If you
want to use non-contiguous indexing for contiguous storage, you would need
some
translation layer. The AO is such a translation, but you could do this any
way you want.

  Thanks,

     Matt


> My initial idea was to create another global index ordering within my
> application to use only for the Petsc interface but then I think that
> the ghost cells are wrong.
>
> On 19/10/2023 14:50, Matthew Knepley wrote:
> > On Thu, Oct 19, 2023 at 6:51?AM Enrico <degregori at dkrz.de
> > <mailto:degregori at dkrz.de>> wrote:
> >
> >     Hello,
> >
> >     if I create an application ordering using AOCreateBasic, should I
> >     provide the same array for const PetscInt myapp[] and const PetscInt
> >     mypetsc[] in order to get the same ordering of the application
> >     within PETSC?
> >
> >
> > Are you asking if the identity permutation can be constructed using the
> > same array twice? Yes.
> >
> >     And once I define the ordering so that the local vector and matrix
> are
> >     defined in PETSC as in my application, how can I use it to create the
> >     actual vector and matrix?
> >
> >
> > The vectors and matrices do not change. The AO is a permutation. You can
> > use it to permute
> > a vector into another order, or to convert on index to another.
> >
> >    Thanks,
> >
> >        Matt
> >
> >     Thanks in advance for the help.
> >
> >     Cheers,
> >     Enrico
> >
> >     On 18/10/2023 13:39, Matthew Knepley wrote:
> >      > On Wed, Oct 18, 2023 at 5:55?AM Enrico <degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
> >      >
> >      >     Hello,
> >      >
> >      >     I'm trying to use Petsc to solve a linear system in an
> >     application. I'm
> >      >     using the coordinate format to define the matrix and the
> >     vector (it
> >      >     should work better on GPU but at the moment every test is on
> >     CPU).
> >      >     After
> >      >     the call to VecSetValuesCOO, I've noticed that the vector is
> >     storing
> >      >     the
> >      >     data in a different way from my application. For example with
> two
> >      >     processes in the application
> >      >
> >      >     process 0 owns cells 2, 3, 4
> >      >
> >      >     process 1 owns cells 0, 1, 5
> >      >
> >      >     But in the vector data structure of Petsc
> >      >
> >      >     process 0 owns cells 0, 1, 2
> >      >
> >      >     process 1 owns cells 3, 4, 5
> >      >
> >      >     This is in principle not a big issue, but after solving the
> >     linear
> >      >     system I get the solution vector x and I want to get the
> >     values in the
> >      >     correct processes. Is there a way to get vector values from
> other
> >      >     processes or to get a mapping so that I can do it myself?
> >      >
> >      >
> >      > By definition, PETSc vectors and matrices own contiguous row
> >     blocks. If
> >      > you want to have another,
> >      > global ordering, we support that with
> >      > https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      > <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >
> >      >    Thanks,
> >      >
> >      >       Matt
> >      >
> >      >     Cheers,
> >      >     Enrico Degregori
> >      >
> >      >
> >      >
> >      > --
> >      > What most experimenters take for granted before they begin their
> >      > experiments is infinitely more interesting than any results to
> which
> >      > their experiments lead.
> >      > -- Norbert Wiener
> >      >
> >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/ <
> http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231019/d361b094/attachment-0001.html>

From degregori at dkrz.de  Thu Oct 19 09:51:41 2023
From: degregori at dkrz.de (Enrico)
Date: Thu, 19 Oct 2023 16:51:41 +0200
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
	<1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
	<CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>
Message-ID: <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de>

In the application the storage is contiguous but the global indexing is 
not. I would like to use AO as a translation layer but I don't 
understand it.

My case is actually simple even if it is in a large application, I have

Mat A, Vec b and Vec x

After calling KSPSolve, I use VecGetArrayReadF90 to get a pointer to the 
data and they are in the wrong ordering, so for example the first 
element of the solution array on process 0 belongs to process 1 in the 
application.

Is it at this point that I should use the AO translation layer? This 
would be quite bad, it means to build Mat A and Vec b there is MPI 
communication and also to get the data of Vec x back in the application.

Anyway, I've tried to use AOPetscToApplicationPermuteReal on the 
solution array but it doesn't work as I would like. Is this function 
suppose to do MPI communication between processes and fetch the values 
of the application ordering?

Cheers,
Enrico

On 19/10/2023 15:25, Matthew Knepley wrote:
> On Thu, Oct 19, 2023 at 8:57?AM Enrico <degregori at dkrz.de 
> <mailto:degregori at dkrz.de>> wrote:
> 
>     Maybe I wasn't clear enough. I would like to completely get rid of
>     Petsc
>     ordering because I don't want extra communication between processes to
>     construct the vector and the matrix (since I have to fill them every
>     time step because I'm just using the linear solver with a Mat and a Vec
>     data structure). I don't understand how I can do that.
> 
> 
> Any program you write to do linear algebra will have contiguous storage 
> because it
> is so much faster. Contiguous indexing makes sense for contiguous 
> storage. If you
> want to use non-contiguous indexing for contiguous storage, you would 
> need some
> translation layer. The AO is such a translation, but you could do this 
> any way you want.
> 
>  ? Thanks,
> 
>  ? ? ?Matt
> 
>     My initial idea was to create another global index ordering within my
>     application to use only for the Petsc interface but then I think that
>     the ghost cells are wrong.
> 
>     On 19/10/2023 14:50, Matthew Knepley wrote:
>      > On Thu, Oct 19, 2023 at 6:51?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
>      >
>      >? ? ?Hello,
>      >
>      >? ? ?if I create an application ordering using AOCreateBasic, should I
>      >? ? ?provide the same array for const PetscInt myapp[] and const
>     PetscInt
>      >? ? ?mypetsc[] in order to get the same ordering of the application
>      >? ? ?within PETSC?
>      >
>      >
>      > Are you asking if the identity permutation can be constructed
>     using the
>      > same array twice? Yes.
>      >
>      >? ? ?And once I define the ordering so that the local vector and
>     matrix are
>      >? ? ?defined in PETSC as in my application, how can I use it to
>     create the
>      >? ? ?actual vector and matrix?
>      >
>      >
>      > The vectors and matrices do not change. The AO is a permutation.
>     You can
>      > use it to permute
>      > a vector into another order, or to convert on index to another.
>      >
>      >? ? Thanks,
>      >
>      >? ? ? ? Matt
>      >
>      >? ? ?Thanks in advance for the help.
>      >
>      >? ? ?Cheers,
>      >? ? ?Enrico
>      >
>      >? ? ?On 18/10/2023 13:39, Matthew Knepley wrote:
>      >? ? ? > On Wed, Oct 18, 2023 at 5:55?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>> wrote:
>      >? ? ? >
>      >? ? ? >? ? ?Hello,
>      >? ? ? >
>      >? ? ? >? ? ?I'm trying to use Petsc to solve a linear system in an
>      >? ? ?application. I'm
>      >? ? ? >? ? ?using the coordinate format to define the matrix and the
>      >? ? ?vector (it
>      >? ? ? >? ? ?should work better on GPU but at the moment every test
>     is on
>      >? ? ?CPU).
>      >? ? ? >? ? ?After
>      >? ? ? >? ? ?the call to VecSetValuesCOO, I've noticed that the
>     vector is
>      >? ? ?storing
>      >? ? ? >? ? ?the
>      >? ? ? >? ? ?data in a different way from my application. For
>     example with two
>      >? ? ? >? ? ?processes in the application
>      >? ? ? >
>      >? ? ? >? ? ?process 0 owns cells 2, 3, 4
>      >? ? ? >
>      >? ? ? >? ? ?process 1 owns cells 0, 1, 5
>      >? ? ? >
>      >? ? ? >? ? ?But in the vector data structure of Petsc
>      >? ? ? >
>      >? ? ? >? ? ?process 0 owns cells 0, 1, 2
>      >? ? ? >
>      >? ? ? >? ? ?process 1 owns cells 3, 4, 5
>      >? ? ? >
>      >? ? ? >? ? ?This is in principle not a big issue, but after
>     solving the
>      >? ? ?linear
>      >? ? ? >? ? ?system I get the solution vector x and I want to get the
>      >? ? ?values in the
>      >? ? ? >? ? ?correct processes. Is there a way to get vector values
>     from other
>      >? ? ? >? ? ?processes or to get a mapping so that I can do it myself?
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > By definition, PETSc vectors and matrices own contiguous row
>      >? ? ?blocks. If
>      >? ? ? > you want to have another,
>      >? ? ? > global ordering, we support that with
>      >? ? ? > https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? > <https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >
>      >? ? ? >? ? Thanks,
>      >? ? ? >
>      >? ? ? >? ? ? ?Matt
>      >? ? ? >
>      >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ?Enrico Degregori
>      >? ? ? >
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > --
>      >? ? ? > What most experimenters take for granted before they begin
>     their
>      >? ? ? > experiments is infinitely more interesting than any
>     results to which
>      >? ? ? > their experiments lead.
>      >? ? ? > -- Norbert Wiener
>      >? ? ? >
>      >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >
>      >
>      >
>      > --
>      > What most experimenters take for granted before they begin their
>      > experiments is infinitely more interesting than any results to which
>      > their experiments lead.
>      > -- Norbert Wiener
>      >
>      > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>     <http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

From knepley at gmail.com  Thu Oct 19 10:21:33 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 19 Oct 2023 11:21:33 -0400
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
	<1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
	<CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>
	<81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de>
Message-ID: <CAMYG4G=_-H8APEAzniRELMoXtv1_HVgrNiJT=ncrrBoDrc5e-Q@mail.gmail.com>

On Thu, Oct 19, 2023 at 10:51?AM Enrico <degregori at dkrz.de> wrote:

> In the application the storage is contiguous but the global indexing is
> not. I would like to use AO as a translation layer but I don't
> understand it.
>

Why would you choose to index differently from your storage?


> My case is actually simple even if it is in a large application, I have
>
> Mat A, Vec b and Vec x
>
> After calling KSPSolve, I use VecGetArrayReadF90 to get a pointer to the
> data and they are in the wrong ordering, so for example the first
> element of the solution array on process 0 belongs to process 1 in the
> application.
>

Again, this seems to be a poor choice of layout. What we typically do is to
partition
the data into chunks owned by each process first.


> Is it at this point that I should use the AO translation layer? This
> would be quite bad, it means to build Mat A and Vec b there is MPI
> communication and also to get the data of Vec x back in the application.
>

If you want to store data that process i updates on process j, this will
need communication.


> Anyway, I've tried to use AOPetscToApplicationPermuteReal on the
> solution array but it doesn't work as I would like. Is this function
> suppose to do MPI communication between processes and fetch the values
> of the application ordering?
>

There is no communication here. That function call just changes one integer
into another.
If you want to update values on another process, we recommend using
VecScatter() or
MatSetValues(), both of which take global indices and do communication if
necessary.

  Thanks,

    Matt


> Cheers,
> Enrico
>
> On 19/10/2023 15:25, Matthew Knepley wrote:
> > On Thu, Oct 19, 2023 at 8:57?AM Enrico <degregori at dkrz.de
> > <mailto:degregori at dkrz.de>> wrote:
> >
> >     Maybe I wasn't clear enough. I would like to completely get rid of
> >     Petsc
> >     ordering because I don't want extra communication between processes
> to
> >     construct the vector and the matrix (since I have to fill them every
> >     time step because I'm just using the linear solver with a Mat and a
> Vec
> >     data structure). I don't understand how I can do that.
> >
> >
> > Any program you write to do linear algebra will have contiguous storage
> > because it
> > is so much faster. Contiguous indexing makes sense for contiguous
> > storage. If you
> > want to use non-contiguous indexing for contiguous storage, you would
> > need some
> > translation layer. The AO is such a translation, but you could do this
> > any way you want.
> >
> >    Thanks,
> >
> >       Matt
> >
> >     My initial idea was to create another global index ordering within my
> >     application to use only for the Petsc interface but then I think that
> >     the ghost cells are wrong.
> >
> >     On 19/10/2023 14:50, Matthew Knepley wrote:
> >      > On Thu, Oct 19, 2023 at 6:51?AM Enrico <degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
> >      >
> >      >     Hello,
> >      >
> >      >     if I create an application ordering using AOCreateBasic,
> should I
> >      >     provide the same array for const PetscInt myapp[] and const
> >     PetscInt
> >      >     mypetsc[] in order to get the same ordering of the application
> >      >     within PETSC?
> >      >
> >      >
> >      > Are you asking if the identity permutation can be constructed
> >     using the
> >      > same array twice? Yes.
> >      >
> >      >     And once I define the ordering so that the local vector and
> >     matrix are
> >      >     defined in PETSC as in my application, how can I use it to
> >     create the
> >      >     actual vector and matrix?
> >      >
> >      >
> >      > The vectors and matrices do not change. The AO is a permutation.
> >     You can
> >      > use it to permute
> >      > a vector into another order, or to convert on index to another.
> >      >
> >      >    Thanks,
> >      >
> >      >        Matt
> >      >
> >      >     Thanks in advance for the help.
> >      >
> >      >     Cheers,
> >      >     Enrico
> >      >
> >      >     On 18/10/2023 13:39, Matthew Knepley wrote:
> >      >      > On Wed, Oct 18, 2023 at 5:55?AM Enrico <degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>> wrote:
> >      >      >
> >      >      >     Hello,
> >      >      >
> >      >      >     I'm trying to use Petsc to solve a linear system in an
> >      >     application. I'm
> >      >      >     using the coordinate format to define the matrix and
> the
> >      >     vector (it
> >      >      >     should work better on GPU but at the moment every test
> >     is on
> >      >     CPU).
> >      >      >     After
> >      >      >     the call to VecSetValuesCOO, I've noticed that the
> >     vector is
> >      >     storing
> >      >      >     the
> >      >      >     data in a different way from my application. For
> >     example with two
> >      >      >     processes in the application
> >      >      >
> >      >      >     process 0 owns cells 2, 3, 4
> >      >      >
> >      >      >     process 1 owns cells 0, 1, 5
> >      >      >
> >      >      >     But in the vector data structure of Petsc
> >      >      >
> >      >      >     process 0 owns cells 0, 1, 2
> >      >      >
> >      >      >     process 1 owns cells 3, 4, 5
> >      >      >
> >      >      >     This is in principle not a big issue, but after
> >     solving the
> >      >     linear
> >      >      >     system I get the solution vector x and I want to get
> the
> >      >     values in the
> >      >      >     correct processes. Is there a way to get vector values
> >     from other
> >      >      >     processes or to get a mapping so that I can do it
> myself?
> >      >      >
> >      >      >
> >      >      > By definition, PETSc vectors and matrices own contiguous
> row
> >      >     blocks. If
> >      >      > you want to have another,
> >      >      > global ordering, we support that with
> >      >      > https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      > <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>
> >      >      >
> >      >      >    Thanks,
> >      >      >
> >      >      >       Matt
> >      >      >
> >      >      >     Cheers,
> >      >      >     Enrico Degregori
> >      >      >
> >      >      >
> >      >      >
> >      >      > --
> >      >      > What most experimenters take for granted before they begin
> >     their
> >      >      > experiments is infinitely more interesting than any
> >     results to which
> >      >      > their experiments lead.
> >      >      > -- Norbert Wiener
> >      >      >
> >      >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>
> >      >
> >      >
> >      >
> >      > --
> >      > What most experimenters take for granted before they begin their
> >      > experiments is infinitely more interesting than any results to
> which
> >      > their experiments lead.
> >      > -- Norbert Wiener
> >      >
> >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/ <
> http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231019/ca1f2156/attachment-0001.html>

From degregori at dkrz.de  Thu Oct 19 10:33:16 2023
From: degregori at dkrz.de (Enrico)
Date: Thu, 19 Oct 2023 17:33:16 +0200
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <CAMYG4G=_-H8APEAzniRELMoXtv1_HVgrNiJT=ncrrBoDrc5e-Q@mail.gmail.com>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
	<1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
	<CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>
	<81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de>
	<CAMYG4G=_-H8APEAzniRELMoXtv1_HVgrNiJT=ncrrBoDrc5e-Q@mail.gmail.com>
Message-ID: <fbb5dbe9-02e3-bac4-8ee0-f4aef4c7af3b@dkrz.de>

The layout is not poor, just the global indices are not contiguous,this 
has nothing to do with the local memory layout which is extremely 
optimized for different architectures. I can not change the layout 
anyway because it's a climate model with a million lines of code.

I don't understand why Petsc is doing all this MPI communication under 
the hood. I mean, it is changing the layout of the application and doing 
a lot of communication. Is there no way to force the same layout and 
provide info about how to do the halo exchange? In this way I can have 
the same memory layout and there is no communication when I fill or 
fetch the vectors and the matrix.

Cheers,
Enrico

On 19/10/2023 17:21, Matthew Knepley wrote:
> On Thu, Oct 19, 2023 at 10:51?AM Enrico <degregori at dkrz.de 
> <mailto:degregori at dkrz.de>> wrote:
> 
>     In the application the storage is contiguous but the global indexing is
>     not. I would like to use AO as a translation layer but I don't
>     understand it.
> 
> 
> Why would you choose to index differently from your storage?
> 
>     My case is actually simple even if it is in a large application, I have
> 
>     Mat A, Vec b and Vec x
> 
>     After calling KSPSolve, I use VecGetArrayReadF90 to get a pointer to
>     the
>     data and they are in the wrong ordering, so for example the first
>     element of the solution array on process 0 belongs to process 1 in the
>     application.
> 
> 
> Again, this seems to be a poor choice of layout. What we typically do is 
> to partition
> the data into chunks owned by each process first.
> 
>     Is it at this point that I should use the AO translation layer? This
>     would be quite bad, it means to build Mat A and Vec b there is MPI
>     communication and also to get the data of Vec x back in the application.
> 
> 
> If you want to store data that process i updates on process j, this will 
> need communication.
> 
>     Anyway, I've tried to use AOPetscToApplicationPermuteReal on the
>     solution array but it doesn't work as I would like. Is this function
>     suppose to do MPI communication between processes and fetch the values
>     of the application ordering?
> 
> 
> There is no communication here. That function call just changes one 
> integer into another.
> If you want to update values on another process, we recommend using 
> VecScatter() or
> MatSetValues(), both of which take global indices and do communication 
> if necessary.
> 
>  ? Thanks,
> 
>  ? ? Matt
> 
>     Cheers,
>     Enrico
> 
>     On 19/10/2023 15:25, Matthew Knepley wrote:
>      > On Thu, Oct 19, 2023 at 8:57?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
>      >
>      >? ? ?Maybe I wasn't clear enough. I would like to completely get
>     rid of
>      >? ? ?Petsc
>      >? ? ?ordering because I don't want extra communication between
>     processes to
>      >? ? ?construct the vector and the matrix (since I have to fill
>     them every
>      >? ? ?time step because I'm just using the linear solver with a Mat
>     and a Vec
>      >? ? ?data structure). I don't understand how I can do that.
>      >
>      >
>      > Any program you write to do linear algebra will have contiguous
>     storage
>      > because it
>      > is so much faster. Contiguous indexing makes sense for contiguous
>      > storage. If you
>      > want to use non-contiguous indexing for contiguous storage, you
>     would
>      > need some
>      > translation layer. The AO is such a translation, but you could do
>     this
>      > any way you want.
>      >
>      >? ? Thanks,
>      >
>      >? ? ? ?Matt
>      >
>      >? ? ?My initial idea was to create another global index ordering
>     within my
>      >? ? ?application to use only for the Petsc interface but then I
>     think that
>      >? ? ?the ghost cells are wrong.
>      >
>      >? ? ?On 19/10/2023 14:50, Matthew Knepley wrote:
>      >? ? ? > On Thu, Oct 19, 2023 at 6:51?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>> wrote:
>      >? ? ? >
>      >? ? ? >? ? ?Hello,
>      >? ? ? >
>      >? ? ? >? ? ?if I create an application ordering using
>     AOCreateBasic, should I
>      >? ? ? >? ? ?provide the same array for const PetscInt myapp[] and
>     const
>      >? ? ?PetscInt
>      >? ? ? >? ? ?mypetsc[] in order to get the same ordering of the
>     application
>      >? ? ? >? ? ?within PETSC?
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > Are you asking if the identity permutation can be constructed
>      >? ? ?using the
>      >? ? ? > same array twice? Yes.
>      >? ? ? >
>      >? ? ? >? ? ?And once I define the ordering so that the local
>     vector and
>      >? ? ?matrix are
>      >? ? ? >? ? ?defined in PETSC as in my application, how can I use it to
>      >? ? ?create the
>      >? ? ? >? ? ?actual vector and matrix?
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > The vectors and matrices do not change. The AO is a
>     permutation.
>      >? ? ?You can
>      >? ? ? > use it to permute
>      >? ? ? > a vector into another order, or to convert on index to
>     another.
>      >? ? ? >
>      >? ? ? >? ? Thanks,
>      >? ? ? >
>      >? ? ? >? ? ? ? Matt
>      >? ? ? >
>      >? ? ? >? ? ?Thanks in advance for the help.
>      >? ? ? >
>      >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ?Enrico
>      >? ? ? >
>      >? ? ? >? ? ?On 18/10/2023 13:39, Matthew Knepley wrote:
>      >? ? ? >? ? ? > On Wed, Oct 18, 2023 at 5:55?AM Enrico
>     <degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>> wrote:
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Hello,
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?I'm trying to use Petsc to solve a linear
>     system in an
>      >? ? ? >? ? ?application. I'm
>      >? ? ? >? ? ? >? ? ?using the coordinate format to define the
>     matrix and the
>      >? ? ? >? ? ?vector (it
>      >? ? ? >? ? ? >? ? ?should work better on GPU but at the moment
>     every test
>      >? ? ?is on
>      >? ? ? >? ? ?CPU).
>      >? ? ? >? ? ? >? ? ?After
>      >? ? ? >? ? ? >? ? ?the call to VecSetValuesCOO, I've noticed that the
>      >? ? ?vector is
>      >? ? ? >? ? ?storing
>      >? ? ? >? ? ? >? ? ?the
>      >? ? ? >? ? ? >? ? ?data in a different way from my application. For
>      >? ? ?example with two
>      >? ? ? >? ? ? >? ? ?processes in the application
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?process 0 owns cells 2, 3, 4
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?process 1 owns cells 0, 1, 5
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?But in the vector data structure of Petsc
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?process 0 owns cells 0, 1, 2
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?process 1 owns cells 3, 4, 5
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?This is in principle not a big issue, but after
>      >? ? ?solving the
>      >? ? ? >? ? ?linear
>      >? ? ? >? ? ? >? ? ?system I get the solution vector x and I want
>     to get the
>      >? ? ? >? ? ?values in the
>      >? ? ? >? ? ? >? ? ?correct processes. Is there a way to get vector
>     values
>      >? ? ?from other
>      >? ? ? >? ? ? >? ? ?processes or to get a mapping so that I can do
>     it myself?
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > By definition, PETSc vectors and matrices own
>     contiguous row
>      >? ? ? >? ? ?blocks. If
>      >? ? ? >? ? ? > you want to have another,
>      >? ? ? >? ? ? > global ordering, we support that with
>      >? ? ? >? ? ? > https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? > <https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? ?Matt
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ? >? ? ?Enrico Degregori
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > --
>      >? ? ? >? ? ? > What most experimenters take for granted before
>     they begin
>      >? ? ?their
>      >? ? ? >? ? ? > experiments is infinitely more interesting than any
>      >? ? ?results to which
>      >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > --
>      >? ? ? > What most experimenters take for granted before they begin
>     their
>      >? ? ? > experiments is infinitely more interesting than any
>     results to which
>      >? ? ? > their experiments lead.
>      >? ? ? > -- Norbert Wiener
>      >? ? ? >
>      >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >
>      >
>      >
>      > --
>      > What most experimenters take for granted before they begin their
>      > experiments is infinitely more interesting than any results to which
>      > their experiments lead.
>      > -- Norbert Wiener
>      >
>      > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>     <http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

From knepley at gmail.com  Thu Oct 19 10:36:45 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 19 Oct 2023 11:36:45 -0400
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <fbb5dbe9-02e3-bac4-8ee0-f4aef4c7af3b@dkrz.de>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
	<1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
	<CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>
	<81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de>
	<CAMYG4G=_-H8APEAzniRELMoXtv1_HVgrNiJT=ncrrBoDrc5e-Q@mail.gmail.com>
	<fbb5dbe9-02e3-bac4-8ee0-f4aef4c7af3b@dkrz.de>
Message-ID: <CAMYG4Gk7m8AVxouB=BgY1DAdyU84_gG+C=tOSrnMTc=YAcjzPA@mail.gmail.com>

On Thu, Oct 19, 2023 at 11:33?AM Enrico <degregori at dkrz.de> wrote:

> The layout is not poor, just the global indices are not contiguous,this
> has nothing to do with the local memory layout which is extremely
> optimized for different architectures. I can not change the layout
> anyway because it's a climate model with a million lines of code.
>
> I don't understand why Petsc is doing all this MPI communication under
> the hood.


I don't think we are communicating under the hood.


> I mean, it is changing the layout of the application and doing
> a lot of communication.


We do not create the layout. The user creates the data layout when they
create a vector or matrix.


> Is there no way to force the same layout and
> provide info about how to do the halo exchange? In this way I can have
> the same memory layout and there is no communication when I fill or
> fetch the vectors and the matrix.
>

Yes, you tell the vector/matrix your data layout when you create it.

  Thanks,

      Matt


> Cheers,
> Enrico
>
> On 19/10/2023 17:21, Matthew Knepley wrote:
> > On Thu, Oct 19, 2023 at 10:51?AM Enrico <degregori at dkrz.de
> > <mailto:degregori at dkrz.de>> wrote:
> >
> >     In the application the storage is contiguous but the global indexing
> is
> >     not. I would like to use AO as a translation layer but I don't
> >     understand it.
> >
> >
> > Why would you choose to index differently from your storage?
> >
> >     My case is actually simple even if it is in a large application, I
> have
> >
> >     Mat A, Vec b and Vec x
> >
> >     After calling KSPSolve, I use VecGetArrayReadF90 to get a pointer to
> >     the
> >     data and they are in the wrong ordering, so for example the first
> >     element of the solution array on process 0 belongs to process 1 in
> the
> >     application.
> >
> >
> > Again, this seems to be a poor choice of layout. What we typically do is
> > to partition
> > the data into chunks owned by each process first.
> >
> >     Is it at this point that I should use the AO translation layer? This
> >     would be quite bad, it means to build Mat A and Vec b there is MPI
> >     communication and also to get the data of Vec x back in the
> application.
> >
> >
> > If you want to store data that process i updates on process j, this will
> > need communication.
> >
> >     Anyway, I've tried to use AOPetscToApplicationPermuteReal on the
> >     solution array but it doesn't work as I would like. Is this function
> >     suppose to do MPI communication between processes and fetch the
> values
> >     of the application ordering?
> >
> >
> > There is no communication here. That function call just changes one
> > integer into another.
> > If you want to update values on another process, we recommend using
> > VecScatter() or
> > MatSetValues(), both of which take global indices and do communication
> > if necessary.
> >
> >    Thanks,
> >
> >      Matt
> >
> >     Cheers,
> >     Enrico
> >
> >     On 19/10/2023 15:25, Matthew Knepley wrote:
> >      > On Thu, Oct 19, 2023 at 8:57?AM Enrico <degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
> >      >
> >      >     Maybe I wasn't clear enough. I would like to completely get
> >     rid of
> >      >     Petsc
> >      >     ordering because I don't want extra communication between
> >     processes to
> >      >     construct the vector and the matrix (since I have to fill
> >     them every
> >      >     time step because I'm just using the linear solver with a Mat
> >     and a Vec
> >      >     data structure). I don't understand how I can do that.
> >      >
> >      >
> >      > Any program you write to do linear algebra will have contiguous
> >     storage
> >      > because it
> >      > is so much faster. Contiguous indexing makes sense for contiguous
> >      > storage. If you
> >      > want to use non-contiguous indexing for contiguous storage, you
> >     would
> >      > need some
> >      > translation layer. The AO is such a translation, but you could do
> >     this
> >      > any way you want.
> >      >
> >      >    Thanks,
> >      >
> >      >       Matt
> >      >
> >      >     My initial idea was to create another global index ordering
> >     within my
> >      >     application to use only for the Petsc interface but then I
> >     think that
> >      >     the ghost cells are wrong.
> >      >
> >      >     On 19/10/2023 14:50, Matthew Knepley wrote:
> >      >      > On Thu, Oct 19, 2023 at 6:51?AM Enrico <degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>> wrote:
> >      >      >
> >      >      >     Hello,
> >      >      >
> >      >      >     if I create an application ordering using
> >     AOCreateBasic, should I
> >      >      >     provide the same array for const PetscInt myapp[] and
> >     const
> >      >     PetscInt
> >      >      >     mypetsc[] in order to get the same ordering of the
> >     application
> >      >      >     within PETSC?
> >      >      >
> >      >      >
> >      >      > Are you asking if the identity permutation can be
> constructed
> >      >     using the
> >      >      > same array twice? Yes.
> >      >      >
> >      >      >     And once I define the ordering so that the local
> >     vector and
> >      >     matrix are
> >      >      >     defined in PETSC as in my application, how can I use
> it to
> >      >     create the
> >      >      >     actual vector and matrix?
> >      >      >
> >      >      >
> >      >      > The vectors and matrices do not change. The AO is a
> >     permutation.
> >      >     You can
> >      >      > use it to permute
> >      >      > a vector into another order, or to convert on index to
> >     another.
> >      >      >
> >      >      >    Thanks,
> >      >      >
> >      >      >        Matt
> >      >      >
> >      >      >     Thanks in advance for the help.
> >      >      >
> >      >      >     Cheers,
> >      >      >     Enrico
> >      >      >
> >      >      >     On 18/10/2023 13:39, Matthew Knepley wrote:
> >      >      >      > On Wed, Oct 18, 2023 at 5:55?AM Enrico
> >     <degregori at dkrz.de <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
> >      >      >      > <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>> wrote:
> >      >      >      >
> >      >      >      >     Hello,
> >      >      >      >
> >      >      >      >     I'm trying to use Petsc to solve a linear
> >     system in an
> >      >      >     application. I'm
> >      >      >      >     using the coordinate format to define the
> >     matrix and the
> >      >      >     vector (it
> >      >      >      >     should work better on GPU but at the moment
> >     every test
> >      >     is on
> >      >      >     CPU).
> >      >      >      >     After
> >      >      >      >     the call to VecSetValuesCOO, I've noticed that
> the
> >      >     vector is
> >      >      >     storing
> >      >      >      >     the
> >      >      >      >     data in a different way from my application. For
> >      >     example with two
> >      >      >      >     processes in the application
> >      >      >      >
> >      >      >      >     process 0 owns cells 2, 3, 4
> >      >      >      >
> >      >      >      >     process 1 owns cells 0, 1, 5
> >      >      >      >
> >      >      >      >     But in the vector data structure of Petsc
> >      >      >      >
> >      >      >      >     process 0 owns cells 0, 1, 2
> >      >      >      >
> >      >      >      >     process 1 owns cells 3, 4, 5
> >      >      >      >
> >      >      >      >     This is in principle not a big issue, but after
> >      >     solving the
> >      >      >     linear
> >      >      >      >     system I get the solution vector x and I want
> >     to get the
> >      >      >     values in the
> >      >      >      >     correct processes. Is there a way to get vector
> >     values
> >      >     from other
> >      >      >      >     processes or to get a mapping so that I can do
> >     it myself?
> >      >      >      >
> >      >      >      >
> >      >      >      > By definition, PETSc vectors and matrices own
> >     contiguous row
> >      >      >     blocks. If
> >      >      >      > you want to have another,
> >      >      >      > global ordering, we support that with
> >      >      >      > https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>
> >      >      >      > <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>>
> >      >      >      >
> >      >      >      >    Thanks,
> >      >      >      >
> >      >      >      >       Matt
> >      >      >      >
> >      >      >      >     Cheers,
> >      >      >      >     Enrico Degregori
> >      >      >      >
> >      >      >      >
> >      >      >      >
> >      >      >      > --
> >      >      >      > What most experimenters take for granted before
> >     they begin
> >      >     their
> >      >      >      > experiments is infinitely more interesting than any
> >      >     results to which
> >      >      >      > their experiments lead.
> >      >      >      > -- Norbert Wiener
> >      >      >      >
> >      >      >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>>
> >      >      >
> >      >      >
> >      >      >
> >      >      > --
> >      >      > What most experimenters take for granted before they begin
> >     their
> >      >      > experiments is infinitely more interesting than any
> >     results to which
> >      >      > their experiments lead.
> >      >      > -- Norbert Wiener
> >      >      >
> >      >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>
> >      >
> >      >
> >      >
> >      > --
> >      > What most experimenters take for granted before they begin their
> >      > experiments is infinitely more interesting than any results to
> which
> >      > their experiments lead.
> >      > -- Norbert Wiener
> >      >
> >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/ <
> http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231019/473b9fad/attachment-0001.html>

From degregori at dkrz.de  Thu Oct 19 11:28:17 2023
From: degregori at dkrz.de (Enrico)
Date: Thu, 19 Oct 2023 18:28:17 +0200
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <CAMYG4Gk7m8AVxouB=BgY1DAdyU84_gG+C=tOSrnMTc=YAcjzPA@mail.gmail.com>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
	<1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
	<CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>
	<81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de>
	<CAMYG4G=_-H8APEAzniRELMoXtv1_HVgrNiJT=ncrrBoDrc5e-Q@mail.gmail.com>
	<fbb5dbe9-02e3-bac4-8ee0-f4aef4c7af3b@dkrz.de>
	<CAMYG4Gk7m8AVxouB=BgY1DAdyU84_gG+C=tOSrnMTc=YAcjzPA@mail.gmail.com>
Message-ID: <c1a94976-b6b0-126a-5941-667c2fd44ae0@dkrz.de>

I make an example. If I have a vector with global indices {0,1,2,3,4,5} 
and process 0 owns {2,3,4} while process 1 owns {0,1,5}, the resulting 
vector data structure on Petsc on process 0 owns {0,1,2}. This means 
that the points {0,1} has been sent from process 1 to process 0. I would 
like to have {2,3,4}  on process 0 also in Petsc. Is it more clear now?

On 19/10/2023 17:36, Matthew Knepley wrote:
> On Thu, Oct 19, 2023 at 11:33?AM Enrico <degregori at dkrz.de 
> <mailto:degregori at dkrz.de>> wrote:
> 
>     The layout is not poor, just the global indices are not contiguous,this
>     has nothing to do with the local memory layout which is extremely
>     optimized for different architectures. I can not change the layout
>     anyway because it's a climate model with a million lines of code.
> 
>     I don't understand why Petsc is doing all this MPI communication under
>     the hood. 
> 
> 
> I don't think we are communicating under?the hood.
> 
>     I mean, it is changing the layout of the application and doing
>     a lot of communication.
> 
> 
> We do not create the layout. The user creates the data layout when they 
> create a vector or matrix.
> 
>     Is there no way to force the same layout and
>     provide info about how to do the halo exchange? In this way I can have
>     the same memory layout and there is no communication when I fill or
>     fetch the vectors and the matrix.
> 
> 
> Yes, you tell the vector/matrix your data layout when you create it.
> 
>  ? Thanks,
> 
>  ? ? ? Matt
> 
>     Cheers,
>     Enrico
> 
>     On 19/10/2023 17:21, Matthew Knepley wrote:
>      > On Thu, Oct 19, 2023 at 10:51?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
>      >
>      >? ? ?In the application the storage is contiguous but the global
>     indexing is
>      >? ? ?not. I would like to use AO as a translation layer but I don't
>      >? ? ?understand it.
>      >
>      >
>      > Why would you choose to index differently from your storage?
>      >
>      >? ? ?My case is actually simple even if it is in a large
>     application, I have
>      >
>      >? ? ?Mat A, Vec b and Vec x
>      >
>      >? ? ?After calling KSPSolve, I use VecGetArrayReadF90 to get a
>     pointer to
>      >? ? ?the
>      >? ? ?data and they are in the wrong ordering, so for example the first
>      >? ? ?element of the solution array on process 0 belongs to process
>     1 in the
>      >? ? ?application.
>      >
>      >
>      > Again, this seems to be a poor choice of layout. What we
>     typically do is
>      > to partition
>      > the data into chunks owned by each process first.
>      >
>      >? ? ?Is it at this point that I should use the AO translation
>     layer? This
>      >? ? ?would be quite bad, it means to build Mat A and Vec b there
>     is MPI
>      >? ? ?communication and also to get the data of Vec x back in the
>     application.
>      >
>      >
>      > If you want to store data that process i updates on process j,
>     this will
>      > need communication.
>      >
>      >? ? ?Anyway, I've tried to use AOPetscToApplicationPermuteReal on the
>      >? ? ?solution array but it doesn't work as I would like. Is this
>     function
>      >? ? ?suppose to do MPI communication between processes and fetch
>     the values
>      >? ? ?of the application ordering?
>      >
>      >
>      > There is no communication here. That function call just changes one
>      > integer into another.
>      > If you want to update values on another process, we recommend using
>      > VecScatter() or
>      > MatSetValues(), both of which take global indices and do
>     communication
>      > if necessary.
>      >
>      >? ? Thanks,
>      >
>      >? ? ? Matt
>      >
>      >? ? ?Cheers,
>      >? ? ?Enrico
>      >
>      >? ? ?On 19/10/2023 15:25, Matthew Knepley wrote:
>      >? ? ? > On Thu, Oct 19, 2023 at 8:57?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>> wrote:
>      >? ? ? >
>      >? ? ? >? ? ?Maybe I wasn't clear enough. I would like to
>     completely get
>      >? ? ?rid of
>      >? ? ? >? ? ?Petsc
>      >? ? ? >? ? ?ordering because I don't want extra communication between
>      >? ? ?processes to
>      >? ? ? >? ? ?construct the vector and the matrix (since I have to fill
>      >? ? ?them every
>      >? ? ? >? ? ?time step because I'm just using the linear solver
>     with a Mat
>      >? ? ?and a Vec
>      >? ? ? >? ? ?data structure). I don't understand how I can do that.
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > Any program you write to do linear algebra will have
>     contiguous
>      >? ? ?storage
>      >? ? ? > because it
>      >? ? ? > is so much faster. Contiguous indexing makes sense for
>     contiguous
>      >? ? ? > storage. If you
>      >? ? ? > want to use non-contiguous indexing for contiguous
>     storage, you
>      >? ? ?would
>      >? ? ? > need some
>      >? ? ? > translation layer. The AO is such a translation, but you
>     could do
>      >? ? ?this
>      >? ? ? > any way you want.
>      >? ? ? >
>      >? ? ? >? ? Thanks,
>      >? ? ? >
>      >? ? ? >? ? ? ?Matt
>      >? ? ? >
>      >? ? ? >? ? ?My initial idea was to create another global index
>     ordering
>      >? ? ?within my
>      >? ? ? >? ? ?application to use only for the Petsc interface but then I
>      >? ? ?think that
>      >? ? ? >? ? ?the ghost cells are wrong.
>      >? ? ? >
>      >? ? ? >? ? ?On 19/10/2023 14:50, Matthew Knepley wrote:
>      >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 6:51?AM Enrico
>     <degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>> wrote:
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Hello,
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?if I create an application ordering using
>      >? ? ?AOCreateBasic, should I
>      >? ? ? >? ? ? >? ? ?provide the same array for const PetscInt
>     myapp[] and
>      >? ? ?const
>      >? ? ? >? ? ?PetscInt
>      >? ? ? >? ? ? >? ? ?mypetsc[] in order to get the same ordering of the
>      >? ? ?application
>      >? ? ? >? ? ? >? ? ?within PETSC?
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > Are you asking if the identity permutation can be
>     constructed
>      >? ? ? >? ? ?using the
>      >? ? ? >? ? ? > same array twice? Yes.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?And once I define the ordering so that the local
>      >? ? ?vector and
>      >? ? ? >? ? ?matrix are
>      >? ? ? >? ? ? >? ? ?defined in PETSC as in my application, how can
>     I use it to
>      >? ? ? >? ? ?create the
>      >? ? ? >? ? ? >? ? ?actual vector and matrix?
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > The vectors and matrices do not change. The AO is a
>      >? ? ?permutation.
>      >? ? ? >? ? ?You can
>      >? ? ? >? ? ? > use it to permute
>      >? ? ? >? ? ? > a vector into another order, or to convert on index to
>      >? ? ?another.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? ? Matt
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Thanks in advance for the help.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ? >? ? ?Enrico
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?On 18/10/2023 13:39, Matthew Knepley wrote:
>      >? ? ? >? ? ? >? ? ? > On Wed, Oct 18, 2023 at 5:55?AM Enrico
>      >? ? ?<degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>> wrote:
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?Hello,
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?I'm trying to use Petsc to solve a linear
>      >? ? ?system in an
>      >? ? ? >? ? ? >? ? ?application. I'm
>      >? ? ? >? ? ? >? ? ? >? ? ?using the coordinate format to define the
>      >? ? ?matrix and the
>      >? ? ? >? ? ? >? ? ?vector (it
>      >? ? ? >? ? ? >? ? ? >? ? ?should work better on GPU but at the moment
>      >? ? ?every test
>      >? ? ? >? ? ?is on
>      >? ? ? >? ? ? >? ? ?CPU).
>      >? ? ? >? ? ? >? ? ? >? ? ?After
>      >? ? ? >? ? ? >? ? ? >? ? ?the call to VecSetValuesCOO, I've
>     noticed that the
>      >? ? ? >? ? ?vector is
>      >? ? ? >? ? ? >? ? ?storing
>      >? ? ? >? ? ? >? ? ? >? ? ?the
>      >? ? ? >? ? ? >? ? ? >? ? ?data in a different way from my
>     application. For
>      >? ? ? >? ? ?example with two
>      >? ? ? >? ? ? >? ? ? >? ? ?processes in the application
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 2, 3, 4
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 0, 1, 5
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?But in the vector data structure of Petsc
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 0, 1, 2
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 3, 4, 5
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?This is in principle not a big issue,
>     but after
>      >? ? ? >? ? ?solving the
>      >? ? ? >? ? ? >? ? ?linear
>      >? ? ? >? ? ? >? ? ? >? ? ?system I get the solution vector x and I
>     want
>      >? ? ?to get the
>      >? ? ? >? ? ? >? ? ?values in the
>      >? ? ? >? ? ? >? ? ? >? ? ?correct processes. Is there a way to get
>     vector
>      >? ? ?values
>      >? ? ? >? ? ?from other
>      >? ? ? >? ? ? >? ? ? >? ? ?processes or to get a mapping so that I
>     can do
>      >? ? ?it myself?
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > By definition, PETSc vectors and matrices own
>      >? ? ?contiguous row
>      >? ? ? >? ? ? >? ? ?blocks. If
>      >? ? ? >? ? ? >? ? ? > you want to have another,
>      >? ? ? >? ? ? >? ? ? > global ordering, we support that with
>      >? ? ? >? ? ? >? ? ? > https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>
>      >? ? ? >? ? ? >? ? ? > <https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>>
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? ?Matt
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ? >? ? ? >? ? ?Enrico Degregori
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > --
>      >? ? ? >? ? ? >? ? ? > What most experimenters take for granted before
>      >? ? ?they begin
>      >? ? ? >? ? ?their
>      >? ? ? >? ? ? >? ? ? > experiments is infinitely more interesting
>     than any
>      >? ? ? >? ? ?results to which
>      >? ? ? >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>>
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > --
>      >? ? ? >? ? ? > What most experimenters take for granted before
>     they begin
>      >? ? ?their
>      >? ? ? >? ? ? > experiments is infinitely more interesting than any
>      >? ? ?results to which
>      >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > --
>      >? ? ? > What most experimenters take for granted before they begin
>     their
>      >? ? ? > experiments is infinitely more interesting than any
>     results to which
>      >? ? ? > their experiments lead.
>      >? ? ? > -- Norbert Wiener
>      >? ? ? >
>      >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >
>      >
>      >
>      > --
>      > What most experimenters take for granted before they begin their
>      > experiments is infinitely more interesting than any results to which
>      > their experiments lead.
>      > -- Norbert Wiener
>      >
>      > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>     <http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

From degregori at dkrz.de  Thu Oct 19 12:00:44 2023
From: degregori at dkrz.de (Enrico)
Date: Thu, 19 Oct 2023 19:00:44 +0200
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <CAMYG4Gk7m8AVxouB=BgY1DAdyU84_gG+C=tOSrnMTc=YAcjzPA@mail.gmail.com>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
	<1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
	<CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>
	<81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de>
	<CAMYG4G=_-H8APEAzniRELMoXtv1_HVgrNiJT=ncrrBoDrc5e-Q@mail.gmail.com>
	<fbb5dbe9-02e3-bac4-8ee0-f4aef4c7af3b@dkrz.de>
	<CAMYG4Gk7m8AVxouB=BgY1DAdyU84_gG+C=tOSrnMTc=YAcjzPA@mail.gmail.com>
Message-ID: <84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de>

Here is a very very simple reproducer of my problem. It is a fortran 
program and it has to run with 2 processes.

The output is:

  process             0 : xx_v(            1 ) =     0.000000000000000
  process             0 : xx_v(            2 ) =     1.000000000000000
  process             0 : xx_v(            3 ) =     2.000000000000000
  process             1 : xx_v(            1 ) =     3.000000000000000
  process             1 : xx_v(            2 ) =     4.000000000000000
  process             1 : xx_v(            3 ) =     5.000000000000000

and I would like to have:

  process             0 : xx_v(            1 ) =     2.000000000000000
  process             0 : xx_v(            2 ) =     3.000000000000000
  process             0 : xx_v(            3 ) =     4.000000000000000
  process             1 : xx_v(            1 ) =     0.000000000000000
  process             1 : xx_v(            2 ) =     1.000000000000000
  process             1 : xx_v(            3 ) =     5.000000000000000

How can I do that?

program main
#include <petsc/finclude/petscksp.h>
     use petscksp
     implicit none

     PetscErrorCode ierr
     PetscInt  :: Psize = 6
     integer  :: Lsize
     PetscInt  :: work_size
     PetscInt  :: work_rank
     Vec :: b
     integer, allocatable, dimension(:) :: glb_index
     double precision, allocatable, dimension(:) :: array
     PetscScalar, pointer :: xx_v(:)
     integer :: i
     PetscCount :: csize

     CALL PetscInitialize(ierr)

     Lsize = 3
     csize = Lsize

     allocate(glb_index(0:Lsize-1), array(0:Lsize-1))

     CALL MPI_Comm_size(PETSC_COMM_WORLD, work_size, ierr);
     CALL MPI_Comm_rank(PETSC_COMM_WORLD, work_rank, ierr);
     if (work_rank == 0) then
       glb_index(0) = 2
       glb_index(1) = 3
       glb_index(2) = 4
       array(0) = 2
       array(1) = 3
       array(2) = 4
     else if (work_rank == 1) then
       glb_index(0) = 0
       glb_index(1) = 1
       glb_index(2) = 5
       array(0) = 0
       array(1) = 1
       array(2) = 5
     end if

     ! Create and fill rhs vector
     CALL VecCreate(PETSC_COMM_WORLD, b, ierr);
     CALL VecSetSizes(b, Lsize, Psize, ierr);
     CALL VecSetType(b, VECMPI, ierr);

     CALL VecSetPreallocationCOO(b, csize, glb_index, ierr)
     CALL VecSetValuesCOO(b, array, INSERT_VALUES, ierr)

     CALL VecGetArrayReadF90(b, xx_v, ierr)

     do i=1,Lsize
       write(*,*) 'process ', work_rank, ': xx_v(',i,') = ', xx_v(i)
     end do

     CALL VecRestoreArrayReadF90(b, xx_v, ierr)

     deallocate(glb_index, array)
     CALL VecDestroy(b,ierr)

     CALL PetscFinalize(ierr)

end program main


On 19/10/2023 17:36, Matthew Knepley wrote:
> On Thu, Oct 19, 2023 at 11:33?AM Enrico <degregori at dkrz.de 
> <mailto:degregori at dkrz.de>> wrote:
> 
>     The layout is not poor, just the global indices are not contiguous,this
>     has nothing to do with the local memory layout which is extremely
>     optimized for different architectures. I can not change the layout
>     anyway because it's a climate model with a million lines of code.
> 
>     I don't understand why Petsc is doing all this MPI communication under
>     the hood. 
> 
> 
> I don't think we are communicating under?the hood.
> 
>     I mean, it is changing the layout of the application and doing
>     a lot of communication.
> 
> 
> We do not create the layout. The user creates the data layout when they 
> create a vector or matrix.
> 
>     Is there no way to force the same layout and
>     provide info about how to do the halo exchange? In this way I can have
>     the same memory layout and there is no communication when I fill or
>     fetch the vectors and the matrix.
> 
> 
> Yes, you tell the vector/matrix your data layout when you create it.
> 
>  ? Thanks,
> 
>  ? ? ? Matt
> 
>     Cheers,
>     Enrico
> 
>     On 19/10/2023 17:21, Matthew Knepley wrote:
>      > On Thu, Oct 19, 2023 at 10:51?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
>      >
>      >? ? ?In the application the storage is contiguous but the global
>     indexing is
>      >? ? ?not. I would like to use AO as a translation layer but I don't
>      >? ? ?understand it.
>      >
>      >
>      > Why would you choose to index differently from your storage?
>      >
>      >? ? ?My case is actually simple even if it is in a large
>     application, I have
>      >
>      >? ? ?Mat A, Vec b and Vec x
>      >
>      >? ? ?After calling KSPSolve, I use VecGetArrayReadF90 to get a
>     pointer to
>      >? ? ?the
>      >? ? ?data and they are in the wrong ordering, so for example the first
>      >? ? ?element of the solution array on process 0 belongs to process
>     1 in the
>      >? ? ?application.
>      >
>      >
>      > Again, this seems to be a poor choice of layout. What we
>     typically do is
>      > to partition
>      > the data into chunks owned by each process first.
>      >
>      >? ? ?Is it at this point that I should use the AO translation
>     layer? This
>      >? ? ?would be quite bad, it means to build Mat A and Vec b there
>     is MPI
>      >? ? ?communication and also to get the data of Vec x back in the
>     application.
>      >
>      >
>      > If you want to store data that process i updates on process j,
>     this will
>      > need communication.
>      >
>      >? ? ?Anyway, I've tried to use AOPetscToApplicationPermuteReal on the
>      >? ? ?solution array but it doesn't work as I would like. Is this
>     function
>      >? ? ?suppose to do MPI communication between processes and fetch
>     the values
>      >? ? ?of the application ordering?
>      >
>      >
>      > There is no communication here. That function call just changes one
>      > integer into another.
>      > If you want to update values on another process, we recommend using
>      > VecScatter() or
>      > MatSetValues(), both of which take global indices and do
>     communication
>      > if necessary.
>      >
>      >? ? Thanks,
>      >
>      >? ? ? Matt
>      >
>      >? ? ?Cheers,
>      >? ? ?Enrico
>      >
>      >? ? ?On 19/10/2023 15:25, Matthew Knepley wrote:
>      >? ? ? > On Thu, Oct 19, 2023 at 8:57?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>> wrote:
>      >? ? ? >
>      >? ? ? >? ? ?Maybe I wasn't clear enough. I would like to
>     completely get
>      >? ? ?rid of
>      >? ? ? >? ? ?Petsc
>      >? ? ? >? ? ?ordering because I don't want extra communication between
>      >? ? ?processes to
>      >? ? ? >? ? ?construct the vector and the matrix (since I have to fill
>      >? ? ?them every
>      >? ? ? >? ? ?time step because I'm just using the linear solver
>     with a Mat
>      >? ? ?and a Vec
>      >? ? ? >? ? ?data structure). I don't understand how I can do that.
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > Any program you write to do linear algebra will have
>     contiguous
>      >? ? ?storage
>      >? ? ? > because it
>      >? ? ? > is so much faster. Contiguous indexing makes sense for
>     contiguous
>      >? ? ? > storage. If you
>      >? ? ? > want to use non-contiguous indexing for contiguous
>     storage, you
>      >? ? ?would
>      >? ? ? > need some
>      >? ? ? > translation layer. The AO is such a translation, but you
>     could do
>      >? ? ?this
>      >? ? ? > any way you want.
>      >? ? ? >
>      >? ? ? >? ? Thanks,
>      >? ? ? >
>      >? ? ? >? ? ? ?Matt
>      >? ? ? >
>      >? ? ? >? ? ?My initial idea was to create another global index
>     ordering
>      >? ? ?within my
>      >? ? ? >? ? ?application to use only for the Petsc interface but then I
>      >? ? ?think that
>      >? ? ? >? ? ?the ghost cells are wrong.
>      >? ? ? >
>      >? ? ? >? ? ?On 19/10/2023 14:50, Matthew Knepley wrote:
>      >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 6:51?AM Enrico
>     <degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>> wrote:
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Hello,
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?if I create an application ordering using
>      >? ? ?AOCreateBasic, should I
>      >? ? ? >? ? ? >? ? ?provide the same array for const PetscInt
>     myapp[] and
>      >? ? ?const
>      >? ? ? >? ? ?PetscInt
>      >? ? ? >? ? ? >? ? ?mypetsc[] in order to get the same ordering of the
>      >? ? ?application
>      >? ? ? >? ? ? >? ? ?within PETSC?
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > Are you asking if the identity permutation can be
>     constructed
>      >? ? ? >? ? ?using the
>      >? ? ? >? ? ? > same array twice? Yes.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?And once I define the ordering so that the local
>      >? ? ?vector and
>      >? ? ? >? ? ?matrix are
>      >? ? ? >? ? ? >? ? ?defined in PETSC as in my application, how can
>     I use it to
>      >? ? ? >? ? ?create the
>      >? ? ? >? ? ? >? ? ?actual vector and matrix?
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > The vectors and matrices do not change. The AO is a
>      >? ? ?permutation.
>      >? ? ? >? ? ?You can
>      >? ? ? >? ? ? > use it to permute
>      >? ? ? >? ? ? > a vector into another order, or to convert on index to
>      >? ? ?another.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? ? Matt
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Thanks in advance for the help.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ? >? ? ?Enrico
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?On 18/10/2023 13:39, Matthew Knepley wrote:
>      >? ? ? >? ? ? >? ? ? > On Wed, Oct 18, 2023 at 5:55?AM Enrico
>      >? ? ?<degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>> wrote:
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?Hello,
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?I'm trying to use Petsc to solve a linear
>      >? ? ?system in an
>      >? ? ? >? ? ? >? ? ?application. I'm
>      >? ? ? >? ? ? >? ? ? >? ? ?using the coordinate format to define the
>      >? ? ?matrix and the
>      >? ? ? >? ? ? >? ? ?vector (it
>      >? ? ? >? ? ? >? ? ? >? ? ?should work better on GPU but at the moment
>      >? ? ?every test
>      >? ? ? >? ? ?is on
>      >? ? ? >? ? ? >? ? ?CPU).
>      >? ? ? >? ? ? >? ? ? >? ? ?After
>      >? ? ? >? ? ? >? ? ? >? ? ?the call to VecSetValuesCOO, I've
>     noticed that the
>      >? ? ? >? ? ?vector is
>      >? ? ? >? ? ? >? ? ?storing
>      >? ? ? >? ? ? >? ? ? >? ? ?the
>      >? ? ? >? ? ? >? ? ? >? ? ?data in a different way from my
>     application. For
>      >? ? ? >? ? ?example with two
>      >? ? ? >? ? ? >? ? ? >? ? ?processes in the application
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 2, 3, 4
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 0, 1, 5
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?But in the vector data structure of Petsc
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 0, 1, 2
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 3, 4, 5
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?This is in principle not a big issue,
>     but after
>      >? ? ? >? ? ?solving the
>      >? ? ? >? ? ? >? ? ?linear
>      >? ? ? >? ? ? >? ? ? >? ? ?system I get the solution vector x and I
>     want
>      >? ? ?to get the
>      >? ? ? >? ? ? >? ? ?values in the
>      >? ? ? >? ? ? >? ? ? >? ? ?correct processes. Is there a way to get
>     vector
>      >? ? ?values
>      >? ? ? >? ? ?from other
>      >? ? ? >? ? ? >? ? ? >? ? ?processes or to get a mapping so that I
>     can do
>      >? ? ?it myself?
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > By definition, PETSc vectors and matrices own
>      >? ? ?contiguous row
>      >? ? ? >? ? ? >? ? ?blocks. If
>      >? ? ? >? ? ? >? ? ? > you want to have another,
>      >? ? ? >? ? ? >? ? ? > global ordering, we support that with
>      >? ? ? >? ? ? >? ? ? > https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>
>      >? ? ? >? ? ? >? ? ? > <https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>>
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? ?Matt
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ? >? ? ? >? ? ?Enrico Degregori
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > --
>      >? ? ? >? ? ? >? ? ? > What most experimenters take for granted before
>      >? ? ?they begin
>      >? ? ? >? ? ?their
>      >? ? ? >? ? ? >? ? ? > experiments is infinitely more interesting
>     than any
>      >? ? ? >? ? ?results to which
>      >? ? ? >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>>
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > --
>      >? ? ? >? ? ? > What most experimenters take for granted before
>     they begin
>      >? ? ?their
>      >? ? ? >? ? ? > experiments is infinitely more interesting than any
>      >? ? ?results to which
>      >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > --
>      >? ? ? > What most experimenters take for granted before they begin
>     their
>      >? ? ? > experiments is infinitely more interesting than any
>     results to which
>      >? ? ? > their experiments lead.
>      >? ? ? > -- Norbert Wiener
>      >? ? ? >
>      >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >
>      >
>      >
>      > --
>      > What most experimenters take for granted before they begin their
>      > experiments is infinitely more interesting than any results to which
>      > their experiments lead.
>      > -- Norbert Wiener
>      >
>      > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>     <http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

From knepley at gmail.com  Thu Oct 19 12:43:17 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 19 Oct 2023 13:43:17 -0400
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
	<1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
	<CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>
	<81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de>
	<CAMYG4G=_-H8APEAzniRELMoXtv1_HVgrNiJT=ncrrBoDrc5e-Q@mail.gmail.com>
	<fbb5dbe9-02e3-bac4-8ee0-f4aef4c7af3b@dkrz.de>
	<CAMYG4Gk7m8AVxouB=BgY1DAdyU84_gG+C=tOSrnMTc=YAcjzPA@mail.gmail.com>
	<84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de>
Message-ID: <CAMYG4Gmjqj9Dkis9wvdgALJeKcEZQEtQ0MOaF+K0iZcH3O9u-w@mail.gmail.com>

On Thu, Oct 19, 2023 at 1:00?PM Enrico <degregori at dkrz.de> wrote:

> Here is a very very simple reproducer of my problem. It is a fortran
> program and it has to run with 2 processes.
>

You seem to be saying that you start with one partition of your data, but
you would like
another partition. For this, you have to initially communicate. For this I
would use VecScatter.
However, since most data is generated, I would consider not generating my
data in that initial
distribution.

There are many examples in the repository. In the discretization of a PDE,
we first divide the domain,
then number up each piece, then assemble the linear algebra objects.

  Thanks,

      Matt


> The output is:
>
>   process             0 : xx_v(            1 ) =     0.000000000000000
>   process             0 : xx_v(            2 ) =     1.000000000000000
>   process             0 : xx_v(            3 ) =     2.000000000000000
>   process             1 : xx_v(            1 ) =     3.000000000000000
>   process             1 : xx_v(            2 ) =     4.000000000000000
>   process             1 : xx_v(            3 ) =     5.000000000000000
>
> and I would like to have:
>
>   process             0 : xx_v(            1 ) =     2.000000000000000
>   process             0 : xx_v(            2 ) =     3.000000000000000
>   process             0 : xx_v(            3 ) =     4.000000000000000
>   process             1 : xx_v(            1 ) =     0.000000000000000
>   process             1 : xx_v(            2 ) =     1.000000000000000
>   process             1 : xx_v(            3 ) =     5.000000000000000
>
> How can I do that?
>
> program main
> #include <petsc/finclude/petscksp.h>
>      use petscksp
>      implicit none
>
>      PetscErrorCode ierr
>      PetscInt  :: Psize = 6
>      integer  :: Lsize
>      PetscInt  :: work_size
>      PetscInt  :: work_rank
>      Vec :: b
>      integer, allocatable, dimension(:) :: glb_index
>      double precision, allocatable, dimension(:) :: array
>      PetscScalar, pointer :: xx_v(:)
>      integer :: i
>      PetscCount :: csize
>
>      CALL PetscInitialize(ierr)
>
>      Lsize = 3
>      csize = Lsize
>
>      allocate(glb_index(0:Lsize-1), array(0:Lsize-1))
>
>      CALL MPI_Comm_size(PETSC_COMM_WORLD, work_size, ierr);
>      CALL MPI_Comm_rank(PETSC_COMM_WORLD, work_rank, ierr);
>      if (work_rank == 0) then
>        glb_index(0) = 2
>        glb_index(1) = 3
>        glb_index(2) = 4
>        array(0) = 2
>        array(1) = 3
>        array(2) = 4
>      else if (work_rank == 1) then
>        glb_index(0) = 0
>        glb_index(1) = 1
>        glb_index(2) = 5
>        array(0) = 0
>        array(1) = 1
>        array(2) = 5
>      end if
>
>      ! Create and fill rhs vector
>      CALL VecCreate(PETSC_COMM_WORLD, b, ierr);
>      CALL VecSetSizes(b, Lsize, Psize, ierr);
>      CALL VecSetType(b, VECMPI, ierr);
>
>      CALL VecSetPreallocationCOO(b, csize, glb_index, ierr)
>      CALL VecSetValuesCOO(b, array, INSERT_VALUES, ierr)
>
>      CALL VecGetArrayReadF90(b, xx_v, ierr)
>
>      do i=1,Lsize
>        write(*,*) 'process ', work_rank, ': xx_v(',i,') = ', xx_v(i)
>      end do
>
>      CALL VecRestoreArrayReadF90(b, xx_v, ierr)
>
>      deallocate(glb_index, array)
>      CALL VecDestroy(b,ierr)
>
>      CALL PetscFinalize(ierr)
>
> end program main
>
>
> On 19/10/2023 17:36, Matthew Knepley wrote:
> > On Thu, Oct 19, 2023 at 11:33?AM Enrico <degregori at dkrz.de
> > <mailto:degregori at dkrz.de>> wrote:
> >
> >     The layout is not poor, just the global indices are not
> contiguous,this
> >     has nothing to do with the local memory layout which is extremely
> >     optimized for different architectures. I can not change the layout
> >     anyway because it's a climate model with a million lines of code.
> >
> >     I don't understand why Petsc is doing all this MPI communication
> under
> >     the hood.
> >
> >
> > I don't think we are communicating under the hood.
> >
> >     I mean, it is changing the layout of the application and doing
> >     a lot of communication.
> >
> >
> > We do not create the layout. The user creates the data layout when they
> > create a vector or matrix.
> >
> >     Is there no way to force the same layout and
> >     provide info about how to do the halo exchange? In this way I can
> have
> >     the same memory layout and there is no communication when I fill or
> >     fetch the vectors and the matrix.
> >
> >
> > Yes, you tell the vector/matrix your data layout when you create it.
> >
> >    Thanks,
> >
> >        Matt
> >
> >     Cheers,
> >     Enrico
> >
> >     On 19/10/2023 17:21, Matthew Knepley wrote:
> >      > On Thu, Oct 19, 2023 at 10:51?AM Enrico <degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
> >      >
> >      >     In the application the storage is contiguous but the global
> >     indexing is
> >      >     not. I would like to use AO as a translation layer but I don't
> >      >     understand it.
> >      >
> >      >
> >      > Why would you choose to index differently from your storage?
> >      >
> >      >     My case is actually simple even if it is in a large
> >     application, I have
> >      >
> >      >     Mat A, Vec b and Vec x
> >      >
> >      >     After calling KSPSolve, I use VecGetArrayReadF90 to get a
> >     pointer to
> >      >     the
> >      >     data and they are in the wrong ordering, so for example the
> first
> >      >     element of the solution array on process 0 belongs to process
> >     1 in the
> >      >     application.
> >      >
> >      >
> >      > Again, this seems to be a poor choice of layout. What we
> >     typically do is
> >      > to partition
> >      > the data into chunks owned by each process first.
> >      >
> >      >     Is it at this point that I should use the AO translation
> >     layer? This
> >      >     would be quite bad, it means to build Mat A and Vec b there
> >     is MPI
> >      >     communication and also to get the data of Vec x back in the
> >     application.
> >      >
> >      >
> >      > If you want to store data that process i updates on process j,
> >     this will
> >      > need communication.
> >      >
> >      >     Anyway, I've tried to use AOPetscToApplicationPermuteReal on
> the
> >      >     solution array but it doesn't work as I would like. Is this
> >     function
> >      >     suppose to do MPI communication between processes and fetch
> >     the values
> >      >     of the application ordering?
> >      >
> >      >
> >      > There is no communication here. That function call just changes
> one
> >      > integer into another.
> >      > If you want to update values on another process, we recommend
> using
> >      > VecScatter() or
> >      > MatSetValues(), both of which take global indices and do
> >     communication
> >      > if necessary.
> >      >
> >      >    Thanks,
> >      >
> >      >      Matt
> >      >
> >      >     Cheers,
> >      >     Enrico
> >      >
> >      >     On 19/10/2023 15:25, Matthew Knepley wrote:
> >      >      > On Thu, Oct 19, 2023 at 8:57?AM Enrico <degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>> wrote:
> >      >      >
> >      >      >     Maybe I wasn't clear enough. I would like to
> >     completely get
> >      >     rid of
> >      >      >     Petsc
> >      >      >     ordering because I don't want extra communication
> between
> >      >     processes to
> >      >      >     construct the vector and the matrix (since I have to
> fill
> >      >     them every
> >      >      >     time step because I'm just using the linear solver
> >     with a Mat
> >      >     and a Vec
> >      >      >     data structure). I don't understand how I can do that.
> >      >      >
> >      >      >
> >      >      > Any program you write to do linear algebra will have
> >     contiguous
> >      >     storage
> >      >      > because it
> >      >      > is so much faster. Contiguous indexing makes sense for
> >     contiguous
> >      >      > storage. If you
> >      >      > want to use non-contiguous indexing for contiguous
> >     storage, you
> >      >     would
> >      >      > need some
> >      >      > translation layer. The AO is such a translation, but you
> >     could do
> >      >     this
> >      >      > any way you want.
> >      >      >
> >      >      >    Thanks,
> >      >      >
> >      >      >       Matt
> >      >      >
> >      >      >     My initial idea was to create another global index
> >     ordering
> >      >     within my
> >      >      >     application to use only for the Petsc interface but
> then I
> >      >     think that
> >      >      >     the ghost cells are wrong.
> >      >      >
> >      >      >     On 19/10/2023 14:50, Matthew Knepley wrote:
> >      >      >      > On Thu, Oct 19, 2023 at 6:51?AM Enrico
> >     <degregori at dkrz.de <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
> >      >      >      > <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>> wrote:
> >      >      >      >
> >      >      >      >     Hello,
> >      >      >      >
> >      >      >      >     if I create an application ordering using
> >      >     AOCreateBasic, should I
> >      >      >      >     provide the same array for const PetscInt
> >     myapp[] and
> >      >     const
> >      >      >     PetscInt
> >      >      >      >     mypetsc[] in order to get the same ordering of
> the
> >      >     application
> >      >      >      >     within PETSC?
> >      >      >      >
> >      >      >      >
> >      >      >      > Are you asking if the identity permutation can be
> >     constructed
> >      >      >     using the
> >      >      >      > same array twice? Yes.
> >      >      >      >
> >      >      >      >     And once I define the ordering so that the local
> >      >     vector and
> >      >      >     matrix are
> >      >      >      >     defined in PETSC as in my application, how can
> >     I use it to
> >      >      >     create the
> >      >      >      >     actual vector and matrix?
> >      >      >      >
> >      >      >      >
> >      >      >      > The vectors and matrices do not change. The AO is a
> >      >     permutation.
> >      >      >     You can
> >      >      >      > use it to permute
> >      >      >      > a vector into another order, or to convert on index
> to
> >      >     another.
> >      >      >      >
> >      >      >      >    Thanks,
> >      >      >      >
> >      >      >      >        Matt
> >      >      >      >
> >      >      >      >     Thanks in advance for the help.
> >      >      >      >
> >      >      >      >     Cheers,
> >      >      >      >     Enrico
> >      >      >      >
> >      >      >      >     On 18/10/2023 13:39, Matthew Knepley wrote:
> >      >      >      >      > On Wed, Oct 18, 2023 at 5:55?AM Enrico
> >      >     <degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
> >      >      >      >     <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
> >      >      >      >      > <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
> >      >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>> wrote:
> >      >      >      >      >
> >      >      >      >      >     Hello,
> >      >      >      >      >
> >      >      >      >      >     I'm trying to use Petsc to solve a linear
> >      >     system in an
> >      >      >      >     application. I'm
> >      >      >      >      >     using the coordinate format to define the
> >      >     matrix and the
> >      >      >      >     vector (it
> >      >      >      >      >     should work better on GPU but at the
> moment
> >      >     every test
> >      >      >     is on
> >      >      >      >     CPU).
> >      >      >      >      >     After
> >      >      >      >      >     the call to VecSetValuesCOO, I've
> >     noticed that the
> >      >      >     vector is
> >      >      >      >     storing
> >      >      >      >      >     the
> >      >      >      >      >     data in a different way from my
> >     application. For
> >      >      >     example with two
> >      >      >      >      >     processes in the application
> >      >      >      >      >
> >      >      >      >      >     process 0 owns cells 2, 3, 4
> >      >      >      >      >
> >      >      >      >      >     process 1 owns cells 0, 1, 5
> >      >      >      >      >
> >      >      >      >      >     But in the vector data structure of Petsc
> >      >      >      >      >
> >      >      >      >      >     process 0 owns cells 0, 1, 2
> >      >      >      >      >
> >      >      >      >      >     process 1 owns cells 3, 4, 5
> >      >      >      >      >
> >      >      >      >      >     This is in principle not a big issue,
> >     but after
> >      >      >     solving the
> >      >      >      >     linear
> >      >      >      >      >     system I get the solution vector x and I
> >     want
> >      >     to get the
> >      >      >      >     values in the
> >      >      >      >      >     correct processes. Is there a way to get
> >     vector
> >      >     values
> >      >      >     from other
> >      >      >      >      >     processes or to get a mapping so that I
> >     can do
> >      >     it myself?
> >      >      >      >      >
> >      >      >      >      >
> >      >      >      >      > By definition, PETSc vectors and matrices own
> >      >     contiguous row
> >      >      >      >     blocks. If
> >      >      >      >      > you want to have another,
> >      >      >      >      > global ordering, we support that with
> >      >      >      >      > https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>
> >      >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>>
> >      >      >      >      > <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>
> >      >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>>>
> >      >      >      >      >
> >      >      >      >      >    Thanks,
> >      >      >      >      >
> >      >      >      >      >       Matt
> >      >      >      >      >
> >      >      >      >      >     Cheers,
> >      >      >      >      >     Enrico Degregori
> >      >      >      >      >
> >      >      >      >      >
> >      >      >      >      >
> >      >      >      >      > --
> >      >      >      >      > What most experimenters take for granted
> before
> >      >     they begin
> >      >      >     their
> >      >      >      >      > experiments is infinitely more interesting
> >     than any
> >      >      >     results to which
> >      >      >      >      > their experiments lead.
> >      >      >      >      > -- Norbert Wiener
> >      >      >      >      >
> >      >      >      >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>>
> >      >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>>>
> >      >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>
> >      >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>>>
> >      >      >      >
> >      >      >      >
> >      >      >      >
> >      >      >      > --
> >      >      >      > What most experimenters take for granted before
> >     they begin
> >      >     their
> >      >      >      > experiments is infinitely more interesting than any
> >      >     results to which
> >      >      >      > their experiments lead.
> >      >      >      > -- Norbert Wiener
> >      >      >      >
> >      >      >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>>
> >      >      >
> >      >      >
> >      >      >
> >      >      > --
> >      >      > What most experimenters take for granted before they begin
> >     their
> >      >      > experiments is infinitely more interesting than any
> >     results to which
> >      >      > their experiments lead.
> >      >      > -- Norbert Wiener
> >      >      >
> >      >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>
> >      >
> >      >
> >      >
> >      > --
> >      > What most experimenters take for granted before they begin their
> >      > experiments is infinitely more interesting than any results to
> which
> >      > their experiments lead.
> >      > -- Norbert Wiener
> >      >
> >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/ <
> http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231019/a376e7a7/attachment-0001.html>

From degregori at dkrz.de  Thu Oct 19 12:46:04 2023
From: degregori at dkrz.de (Enrico)
Date: Thu, 19 Oct 2023 19:46:04 +0200
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <CAMYG4Gmjqj9Dkis9wvdgALJeKcEZQEtQ0MOaF+K0iZcH3O9u-w@mail.gmail.com>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
	<1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
	<CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>
	<81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de>
	<CAMYG4G=_-H8APEAzniRELMoXtv1_HVgrNiJT=ncrrBoDrc5e-Q@mail.gmail.com>
	<fbb5dbe9-02e3-bac4-8ee0-f4aef4c7af3b@dkrz.de>
	<CAMYG4Gk7m8AVxouB=BgY1DAdyU84_gG+C=tOSrnMTc=YAcjzPA@mail.gmail.com>
	<84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de>
	<CAMYG4Gmjqj9Dkis9wvdgALJeKcEZQEtQ0MOaF+K0iZcH3O9u-w@mail.gmail.com>
Message-ID: <3e38435f-0f57-1f1f-20ca-a66d529a0387@dkrz.de>

Sorry but I don't want another partition, Petsc internally is changing 
the partition. I would like to have the same partition that I have in 
the application. Is the example not clear?

On 19/10/2023 19:43, Matthew Knepley wrote:
> On Thu, Oct 19, 2023 at 1:00?PM Enrico <degregori at dkrz.de 
> <mailto:degregori at dkrz.de>> wrote:
> 
>     Here is a very very simple reproducer of my problem. It is a fortran
>     program and it has to run with 2 processes.
> 
> 
> You seem to be saying that you start with one partition of your data, 
> but you would like
> another partition. For this, you have to initially communicate. For this 
> I would use VecScatter.
> However, since most data is generated, I would consider not generating 
> my data in that initial
> distribution.
> 
> There are many examples in the repository. In the discretization of a 
> PDE, we first?divide the domain,
> then number up each piece, then assemble the linear algebra objects.
> 
>  ? Thanks,
> 
>  ? ? ? Matt
> 
>     The output is:
> 
>      ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 1 ) =? ? ?0.000000000000000
>      ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 2 ) =? ? ?1.000000000000000
>      ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 3 ) =? ? ?2.000000000000000
>      ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 1 ) =? ? ?3.000000000000000
>      ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 2 ) =? ? ?4.000000000000000
>      ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 3 ) =? ? ?5.000000000000000
> 
>     and I would like to have:
> 
>      ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 1 ) =? ? ?2.000000000000000
>      ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 2 ) =? ? ?3.000000000000000
>      ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 3 ) =? ? ?4.000000000000000
>      ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 1 ) =? ? ?0.000000000000000
>      ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 2 ) =? ? ?1.000000000000000
>      ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 3 ) =? ? ?5.000000000000000
> 
>     How can I do that?
> 
>     program main
>     #include <petsc/finclude/petscksp.h>
>      ? ? ?use petscksp
>      ? ? ?implicit none
> 
>      ? ? ?PetscErrorCode ierr
>      ? ? ?PetscInt? :: Psize = 6
>      ? ? ?integer? :: Lsize
>      ? ? ?PetscInt? :: work_size
>      ? ? ?PetscInt? :: work_rank
>      ? ? ?Vec :: b
>      ? ? ?integer, allocatable, dimension(:) :: glb_index
>      ? ? ?double precision, allocatable, dimension(:) :: array
>      ? ? ?PetscScalar, pointer :: xx_v(:)
>      ? ? ?integer :: i
>      ? ? ?PetscCount :: csize
> 
>      ? ? ?CALL PetscInitialize(ierr)
> 
>      ? ? ?Lsize = 3
>      ? ? ?csize = Lsize
> 
>      ? ? ?allocate(glb_index(0:Lsize-1), array(0:Lsize-1))
> 
>      ? ? ?CALL MPI_Comm_size(PETSC_COMM_WORLD, work_size, ierr);
>      ? ? ?CALL MPI_Comm_rank(PETSC_COMM_WORLD, work_rank, ierr);
>      ? ? ?if (work_rank == 0) then
>      ? ? ? ?glb_index(0) = 2
>      ? ? ? ?glb_index(1) = 3
>      ? ? ? ?glb_index(2) = 4
>      ? ? ? ?array(0) = 2
>      ? ? ? ?array(1) = 3
>      ? ? ? ?array(2) = 4
>      ? ? ?else if (work_rank == 1) then
>      ? ? ? ?glb_index(0) = 0
>      ? ? ? ?glb_index(1) = 1
>      ? ? ? ?glb_index(2) = 5
>      ? ? ? ?array(0) = 0
>      ? ? ? ?array(1) = 1
>      ? ? ? ?array(2) = 5
>      ? ? ?end if
> 
>      ? ? ?! Create and fill rhs vector
>      ? ? ?CALL VecCreate(PETSC_COMM_WORLD, b, ierr);
>      ? ? ?CALL VecSetSizes(b, Lsize, Psize, ierr);
>      ? ? ?CALL VecSetType(b, VECMPI, ierr);
> 
>      ? ? ?CALL VecSetPreallocationCOO(b, csize, glb_index, ierr)
>      ? ? ?CALL VecSetValuesCOO(b, array, INSERT_VALUES, ierr)
> 
>      ? ? ?CALL VecGetArrayReadF90(b, xx_v, ierr)
> 
>      ? ? ?do i=1,Lsize
>      ? ? ? ?write(*,*) 'process ', work_rank, ': xx_v(',i,') = ', xx_v(i)
>      ? ? ?end do
> 
>      ? ? ?CALL VecRestoreArrayReadF90(b, xx_v, ierr)
> 
>      ? ? ?deallocate(glb_index, array)
>      ? ? ?CALL VecDestroy(b,ierr)
> 
>      ? ? ?CALL PetscFinalize(ierr)
> 
>     end program main
> 
> 
>     On 19/10/2023 17:36, Matthew Knepley wrote:
>      > On Thu, Oct 19, 2023 at 11:33?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
>      >
>      >? ? ?The layout is not poor, just the global indices are not
>     contiguous,this
>      >? ? ?has nothing to do with the local memory layout which is extremely
>      >? ? ?optimized for different architectures. I can not change the
>     layout
>      >? ? ?anyway because it's a climate model with a million lines of code.
>      >
>      >? ? ?I don't understand why Petsc is doing all this MPI
>     communication under
>      >? ? ?the hood.
>      >
>      >
>      > I don't think we are communicating under?the hood.
>      >
>      >? ? ?I mean, it is changing the layout of the application and doing
>      >? ? ?a lot of communication.
>      >
>      >
>      > We do not create the layout. The user creates the data layout
>     when they
>      > create a vector or matrix.
>      >
>      >? ? ?Is there no way to force the same layout and
>      >? ? ?provide info about how to do the halo exchange? In this way I
>     can have
>      >? ? ?the same memory layout and there is no communication when I
>     fill or
>      >? ? ?fetch the vectors and the matrix.
>      >
>      >
>      > Yes, you tell the vector/matrix your data layout when you create it.
>      >
>      >? ? Thanks,
>      >
>      >? ? ? ? Matt
>      >
>      >? ? ?Cheers,
>      >? ? ?Enrico
>      >
>      >? ? ?On 19/10/2023 17:21, Matthew Knepley wrote:
>      >? ? ? > On Thu, Oct 19, 2023 at 10:51?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>> wrote:
>      >? ? ? >
>      >? ? ? >? ? ?In the application the storage is contiguous but the
>     global
>      >? ? ?indexing is
>      >? ? ? >? ? ?not. I would like to use AO as a translation layer but
>     I don't
>      >? ? ? >? ? ?understand it.
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > Why would you choose to index differently from your storage?
>      >? ? ? >
>      >? ? ? >? ? ?My case is actually simple even if it is in a large
>      >? ? ?application, I have
>      >? ? ? >
>      >? ? ? >? ? ?Mat A, Vec b and Vec x
>      >? ? ? >
>      >? ? ? >? ? ?After calling KSPSolve, I use VecGetArrayReadF90 to get a
>      >? ? ?pointer to
>      >? ? ? >? ? ?the
>      >? ? ? >? ? ?data and they are in the wrong ordering, so for
>     example the first
>      >? ? ? >? ? ?element of the solution array on process 0 belongs to
>     process
>      >? ? ?1 in the
>      >? ? ? >? ? ?application.
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > Again, this seems to be a poor choice of layout. What we
>      >? ? ?typically do is
>      >? ? ? > to partition
>      >? ? ? > the data into chunks owned by each process first.
>      >? ? ? >
>      >? ? ? >? ? ?Is it at this point that I should use the AO translation
>      >? ? ?layer? This
>      >? ? ? >? ? ?would be quite bad, it means to build Mat A and Vec b
>     there
>      >? ? ?is MPI
>      >? ? ? >? ? ?communication and also to get the data of Vec x back
>     in the
>      >? ? ?application.
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > If you want to store data that process i updates on process j,
>      >? ? ?this will
>      >? ? ? > need communication.
>      >? ? ? >
>      >? ? ? >? ? ?Anyway, I've tried to use
>     AOPetscToApplicationPermuteReal on the
>      >? ? ? >? ? ?solution array but it doesn't work as I would like. Is
>     this
>      >? ? ?function
>      >? ? ? >? ? ?suppose to do MPI communication between processes and
>     fetch
>      >? ? ?the values
>      >? ? ? >? ? ?of the application ordering?
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > There is no communication here. That function call just
>     changes one
>      >? ? ? > integer into another.
>      >? ? ? > If you want to update values on another process, we
>     recommend using
>      >? ? ? > VecScatter() or
>      >? ? ? > MatSetValues(), both of which take global indices and do
>      >? ? ?communication
>      >? ? ? > if necessary.
>      >? ? ? >
>      >? ? ? >? ? Thanks,
>      >? ? ? >
>      >? ? ? >? ? ? Matt
>      >? ? ? >
>      >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ?Enrico
>      >? ? ? >
>      >? ? ? >? ? ?On 19/10/2023 15:25, Matthew Knepley wrote:
>      >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 8:57?AM Enrico
>     <degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>> wrote:
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Maybe I wasn't clear enough. I would like to
>      >? ? ?completely get
>      >? ? ? >? ? ?rid of
>      >? ? ? >? ? ? >? ? ?Petsc
>      >? ? ? >? ? ? >? ? ?ordering because I don't want extra
>     communication between
>      >? ? ? >? ? ?processes to
>      >? ? ? >? ? ? >? ? ?construct the vector and the matrix (since I
>     have to fill
>      >? ? ? >? ? ?them every
>      >? ? ? >? ? ? >? ? ?time step because I'm just using the linear solver
>      >? ? ?with a Mat
>      >? ? ? >? ? ?and a Vec
>      >? ? ? >? ? ? >? ? ?data structure). I don't understand how I can
>     do that.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > Any program you write to do linear algebra will have
>      >? ? ?contiguous
>      >? ? ? >? ? ?storage
>      >? ? ? >? ? ? > because it
>      >? ? ? >? ? ? > is so much faster. Contiguous indexing makes sense for
>      >? ? ?contiguous
>      >? ? ? >? ? ? > storage. If you
>      >? ? ? >? ? ? > want to use non-contiguous indexing for contiguous
>      >? ? ?storage, you
>      >? ? ? >? ? ?would
>      >? ? ? >? ? ? > need some
>      >? ? ? >? ? ? > translation layer. The AO is such a translation,
>     but you
>      >? ? ?could do
>      >? ? ? >? ? ?this
>      >? ? ? >? ? ? > any way you want.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? ?Matt
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?My initial idea was to create another global index
>      >? ? ?ordering
>      >? ? ? >? ? ?within my
>      >? ? ? >? ? ? >? ? ?application to use only for the Petsc interface
>     but then I
>      >? ? ? >? ? ?think that
>      >? ? ? >? ? ? >? ? ?the ghost cells are wrong.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?On 19/10/2023 14:50, Matthew Knepley wrote:
>      >? ? ? >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 6:51?AM Enrico
>      >? ? ?<degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>> wrote:
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?Hello,
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?if I create an application ordering using
>      >? ? ? >? ? ?AOCreateBasic, should I
>      >? ? ? >? ? ? >? ? ? >? ? ?provide the same array for const PetscInt
>      >? ? ?myapp[] and
>      >? ? ? >? ? ?const
>      >? ? ? >? ? ? >? ? ?PetscInt
>      >? ? ? >? ? ? >? ? ? >? ? ?mypetsc[] in order to get the same
>     ordering of the
>      >? ? ? >? ? ?application
>      >? ? ? >? ? ? >? ? ? >? ? ?within PETSC?
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > Are you asking if the identity permutation
>     can be
>      >? ? ?constructed
>      >? ? ? >? ? ? >? ? ?using the
>      >? ? ? >? ? ? >? ? ? > same array twice? Yes.
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?And once I define the ordering so that
>     the local
>      >? ? ? >? ? ?vector and
>      >? ? ? >? ? ? >? ? ?matrix are
>      >? ? ? >? ? ? >? ? ? >? ? ?defined in PETSC as in my application,
>     how can
>      >? ? ?I use it to
>      >? ? ? >? ? ? >? ? ?create the
>      >? ? ? >? ? ? >? ? ? >? ? ?actual vector and matrix?
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > The vectors and matrices do not change. The
>     AO is a
>      >? ? ? >? ? ?permutation.
>      >? ? ? >? ? ? >? ? ?You can
>      >? ? ? >? ? ? >? ? ? > use it to permute
>      >? ? ? >? ? ? >? ? ? > a vector into another order, or to convert
>     on index to
>      >? ? ? >? ? ?another.
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? ? Matt
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?Thanks in advance for the help.
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ? >? ? ? >? ? ?Enrico
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?On 18/10/2023 13:39, Matthew Knepley wrote:
>      >? ? ? >? ? ? >? ? ? >? ? ? > On Wed, Oct 18, 2023 at 5:55?AM Enrico
>      >? ? ? >? ? ?<degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>>> wrote:
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Hello,
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?I'm trying to use Petsc to solve
>     a linear
>      >? ? ? >? ? ?system in an
>      >? ? ? >? ? ? >? ? ? >? ? ?application. I'm
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?using the coordinate format to
>     define the
>      >? ? ? >? ? ?matrix and the
>      >? ? ? >? ? ? >? ? ? >? ? ?vector (it
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?should work better on GPU but at
>     the moment
>      >? ? ? >? ? ?every test
>      >? ? ? >? ? ? >? ? ?is on
>      >? ? ? >? ? ? >? ? ? >? ? ?CPU).
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?After
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?the call to VecSetValuesCOO, I've
>      >? ? ?noticed that the
>      >? ? ? >? ? ? >? ? ?vector is
>      >? ? ? >? ? ? >? ? ? >? ? ?storing
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?the
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?data in a different way from my
>      >? ? ?application. For
>      >? ? ? >? ? ? >? ? ?example with two
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?processes in the application
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 2, 3, 4
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 0, 1, 5
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?But in the vector data structure
>     of Petsc
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 0, 1, 2
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 3, 4, 5
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?This is in principle not a big issue,
>      >? ? ?but after
>      >? ? ? >? ? ? >? ? ?solving the
>      >? ? ? >? ? ? >? ? ? >? ? ?linear
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?system I get the solution vector
>     x and I
>      >? ? ?want
>      >? ? ? >? ? ?to get the
>      >? ? ? >? ? ? >? ? ? >? ? ?values in the
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?correct processes. Is there a way
>     to get
>      >? ? ?vector
>      >? ? ? >? ? ?values
>      >? ? ? >? ? ? >? ? ?from other
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?processes or to get a mapping so
>     that I
>      >? ? ?can do
>      >? ? ? >? ? ?it myself?
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? > By definition, PETSc vectors and
>     matrices own
>      >? ? ? >? ? ?contiguous row
>      >? ? ? >? ? ? >? ? ? >? ? ?blocks. If
>      >? ? ? >? ? ? >? ? ? >? ? ? > you want to have another,
>      >? ? ? >? ? ? >? ? ? >? ? ? > global ordering, we support that with
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>     https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>     <https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? ?Matt
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Enrico Degregori
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? > --
>      >? ? ? >? ? ? >? ? ? >? ? ? > What most experimenters take for
>     granted before
>      >? ? ? >? ? ?they begin
>      >? ? ? >? ? ? >? ? ?their
>      >? ? ? >? ? ? >? ? ? >? ? ? > experiments is infinitely more
>     interesting
>      >? ? ?than any
>      >? ? ? >? ? ? >? ? ?results to which
>      >? ? ? >? ? ? >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>>>
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > --
>      >? ? ? >? ? ? >? ? ? > What most experimenters take for granted before
>      >? ? ?they begin
>      >? ? ? >? ? ?their
>      >? ? ? >? ? ? >? ? ? > experiments is infinitely more interesting
>     than any
>      >? ? ? >? ? ?results to which
>      >? ? ? >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>>
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > --
>      >? ? ? >? ? ? > What most experimenters take for granted before
>     they begin
>      >? ? ?their
>      >? ? ? >? ? ? > experiments is infinitely more interesting than any
>      >? ? ?results to which
>      >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > --
>      >? ? ? > What most experimenters take for granted before they begin
>     their
>      >? ? ? > experiments is infinitely more interesting than any
>     results to which
>      >? ? ? > their experiments lead.
>      >? ? ? > -- Norbert Wiener
>      >? ? ? >
>      >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >
>      >
>      >
>      > --
>      > What most experimenters take for granted before they begin their
>      > experiments is infinitely more interesting than any results to which
>      > their experiments lead.
>      > -- Norbert Wiener
>      >
>      > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>     <http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

From knepley at gmail.com  Thu Oct 19 13:41:43 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 19 Oct 2023 14:41:43 -0400
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <3e38435f-0f57-1f1f-20ca-a66d529a0387@dkrz.de>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
	<1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
	<CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>
	<81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de>
	<CAMYG4G=_-H8APEAzniRELMoXtv1_HVgrNiJT=ncrrBoDrc5e-Q@mail.gmail.com>
	<fbb5dbe9-02e3-bac4-8ee0-f4aef4c7af3b@dkrz.de>
	<CAMYG4Gk7m8AVxouB=BgY1DAdyU84_gG+C=tOSrnMTc=YAcjzPA@mail.gmail.com>
	<84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de>
	<CAMYG4Gmjqj9Dkis9wvdgALJeKcEZQEtQ0MOaF+K0iZcH3O9u-w@mail.gmail.com>
	<3e38435f-0f57-1f1f-20ca-a66d529a0387@dkrz.de>
Message-ID: <CAMYG4G=qEjwJHLtjQrwo_A06-3GPtpHjbDW8DdJ6Wou7SjoC2w@mail.gmail.com>

On Thu, Oct 19, 2023 at 1:46?PM Enrico <degregori at dkrz.de> wrote:

> Sorry but I don't want another partition, Petsc internally is changing
> the partition. I would like to have the same partition that I have in
> the application. Is the example not clear?
>

"Petsc internally is changing the partition" This is not correct. PETSc does
not prescribe partitions. I refer to the documentation for the creation of
Vec:

  https://petsc.org/main/manualpages/Vec/VecCreateMPI/

Here the user sets the local and global sizes. Since data is contiguous,
these
completely define the vector, and are under user control.

  Thanks,

     Matt


> On 19/10/2023 19:43, Matthew Knepley wrote:
> > On Thu, Oct 19, 2023 at 1:00?PM Enrico <degregori at dkrz.de
> > <mailto:degregori at dkrz.de>> wrote:
> >
> >     Here is a very very simple reproducer of my problem. It is a fortran
> >     program and it has to run with 2 processes.
> >
> >
> > You seem to be saying that you start with one partition of your data,
> > but you would like
> > another partition. For this, you have to initially communicate. For this
> > I would use VecScatter.
> > However, since most data is generated, I would consider not generating
> > my data in that initial
> > distribution.
> >
> > There are many examples in the repository. In the discretization of a
> > PDE, we first divide the domain,
> > then number up each piece, then assemble the linear algebra objects.
> >
> >    Thanks,
> >
> >        Matt
> >
> >     The output is:
> >
> >        process             0 : xx_v(            1 ) =
>  0.000000000000000
> >        process             0 : xx_v(            2 ) =
>  1.000000000000000
> >        process             0 : xx_v(            3 ) =
>  2.000000000000000
> >        process             1 : xx_v(            1 ) =
>  3.000000000000000
> >        process             1 : xx_v(            2 ) =
>  4.000000000000000
> >        process             1 : xx_v(            3 ) =
>  5.000000000000000
> >
> >     and I would like to have:
> >
> >        process             0 : xx_v(            1 ) =
>  2.000000000000000
> >        process             0 : xx_v(            2 ) =
>  3.000000000000000
> >        process             0 : xx_v(            3 ) =
>  4.000000000000000
> >        process             1 : xx_v(            1 ) =
>  0.000000000000000
> >        process             1 : xx_v(            2 ) =
>  1.000000000000000
> >        process             1 : xx_v(            3 ) =
>  5.000000000000000
> >
> >     How can I do that?
> >
> >     program main
> >     #include <petsc/finclude/petscksp.h>
> >           use petscksp
> >           implicit none
> >
> >           PetscErrorCode ierr
> >           PetscInt  :: Psize = 6
> >           integer  :: Lsize
> >           PetscInt  :: work_size
> >           PetscInt  :: work_rank
> >           Vec :: b
> >           integer, allocatable, dimension(:) :: glb_index
> >           double precision, allocatable, dimension(:) :: array
> >           PetscScalar, pointer :: xx_v(:)
> >           integer :: i
> >           PetscCount :: csize
> >
> >           CALL PetscInitialize(ierr)
> >
> >           Lsize = 3
> >           csize = Lsize
> >
> >           allocate(glb_index(0:Lsize-1), array(0:Lsize-1))
> >
> >           CALL MPI_Comm_size(PETSC_COMM_WORLD, work_size, ierr);
> >           CALL MPI_Comm_rank(PETSC_COMM_WORLD, work_rank, ierr);
> >           if (work_rank == 0) then
> >             glb_index(0) = 2
> >             glb_index(1) = 3
> >             glb_index(2) = 4
> >             array(0) = 2
> >             array(1) = 3
> >             array(2) = 4
> >           else if (work_rank == 1) then
> >             glb_index(0) = 0
> >             glb_index(1) = 1
> >             glb_index(2) = 5
> >             array(0) = 0
> >             array(1) = 1
> >             array(2) = 5
> >           end if
> >
> >           ! Create and fill rhs vector
> >           CALL VecCreate(PETSC_COMM_WORLD, b, ierr);
> >           CALL VecSetSizes(b, Lsize, Psize, ierr);
> >           CALL VecSetType(b, VECMPI, ierr);
> >
> >           CALL VecSetPreallocationCOO(b, csize, glb_index, ierr)
> >           CALL VecSetValuesCOO(b, array, INSERT_VALUES, ierr)
> >
> >           CALL VecGetArrayReadF90(b, xx_v, ierr)
> >
> >           do i=1,Lsize
> >             write(*,*) 'process ', work_rank, ': xx_v(',i,') = ', xx_v(i)
> >           end do
> >
> >           CALL VecRestoreArrayReadF90(b, xx_v, ierr)
> >
> >           deallocate(glb_index, array)
> >           CALL VecDestroy(b,ierr)
> >
> >           CALL PetscFinalize(ierr)
> >
> >     end program main
> >
> >
> >     On 19/10/2023 17:36, Matthew Knepley wrote:
> >      > On Thu, Oct 19, 2023 at 11:33?AM Enrico <degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
> >      >
> >      >     The layout is not poor, just the global indices are not
> >     contiguous,this
> >      >     has nothing to do with the local memory layout which is
> extremely
> >      >     optimized for different architectures. I can not change the
> >     layout
> >      >     anyway because it's a climate model with a million lines of
> code.
> >      >
> >      >     I don't understand why Petsc is doing all this MPI
> >     communication under
> >      >     the hood.
> >      >
> >      >
> >      > I don't think we are communicating under the hood.
> >      >
> >      >     I mean, it is changing the layout of the application and doing
> >      >     a lot of communication.
> >      >
> >      >
> >      > We do not create the layout. The user creates the data layout
> >     when they
> >      > create a vector or matrix.
> >      >
> >      >     Is there no way to force the same layout and
> >      >     provide info about how to do the halo exchange? In this way I
> >     can have
> >      >     the same memory layout and there is no communication when I
> >     fill or
> >      >     fetch the vectors and the matrix.
> >      >
> >      >
> >      > Yes, you tell the vector/matrix your data layout when you create
> it.
> >      >
> >      >    Thanks,
> >      >
> >      >        Matt
> >      >
> >      >     Cheers,
> >      >     Enrico
> >      >
> >      >     On 19/10/2023 17:21, Matthew Knepley wrote:
> >      >      > On Thu, Oct 19, 2023 at 10:51?AM Enrico <degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>> wrote:
> >      >      >
> >      >      >     In the application the storage is contiguous but the
> >     global
> >      >     indexing is
> >      >      >     not. I would like to use AO as a translation layer but
> >     I don't
> >      >      >     understand it.
> >      >      >
> >      >      >
> >      >      > Why would you choose to index differently from your
> storage?
> >      >      >
> >      >      >     My case is actually simple even if it is in a large
> >      >     application, I have
> >      >      >
> >      >      >     Mat A, Vec b and Vec x
> >      >      >
> >      >      >     After calling KSPSolve, I use VecGetArrayReadF90 to
> get a
> >      >     pointer to
> >      >      >     the
> >      >      >     data and they are in the wrong ordering, so for
> >     example the first
> >      >      >     element of the solution array on process 0 belongs to
> >     process
> >      >     1 in the
> >      >      >     application.
> >      >      >
> >      >      >
> >      >      > Again, this seems to be a poor choice of layout. What we
> >      >     typically do is
> >      >      > to partition
> >      >      > the data into chunks owned by each process first.
> >      >      >
> >      >      >     Is it at this point that I should use the AO
> translation
> >      >     layer? This
> >      >      >     would be quite bad, it means to build Mat A and Vec b
> >     there
> >      >     is MPI
> >      >      >     communication and also to get the data of Vec x back
> >     in the
> >      >     application.
> >      >      >
> >      >      >
> >      >      > If you want to store data that process i updates on
> process j,
> >      >     this will
> >      >      > need communication.
> >      >      >
> >      >      >     Anyway, I've tried to use
> >     AOPetscToApplicationPermuteReal on the
> >      >      >     solution array but it doesn't work as I would like. Is
> >     this
> >      >     function
> >      >      >     suppose to do MPI communication between processes and
> >     fetch
> >      >     the values
> >      >      >     of the application ordering?
> >      >      >
> >      >      >
> >      >      > There is no communication here. That function call just
> >     changes one
> >      >      > integer into another.
> >      >      > If you want to update values on another process, we
> >     recommend using
> >      >      > VecScatter() or
> >      >      > MatSetValues(), both of which take global indices and do
> >      >     communication
> >      >      > if necessary.
> >      >      >
> >      >      >    Thanks,
> >      >      >
> >      >      >      Matt
> >      >      >
> >      >      >     Cheers,
> >      >      >     Enrico
> >      >      >
> >      >      >     On 19/10/2023 15:25, Matthew Knepley wrote:
> >      >      >      > On Thu, Oct 19, 2023 at 8:57?AM Enrico
> >     <degregori at dkrz.de <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
> >      >      >      > <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>> wrote:
> >      >      >      >
> >      >      >      >     Maybe I wasn't clear enough. I would like to
> >      >     completely get
> >      >      >     rid of
> >      >      >      >     Petsc
> >      >      >      >     ordering because I don't want extra
> >     communication between
> >      >      >     processes to
> >      >      >      >     construct the vector and the matrix (since I
> >     have to fill
> >      >      >     them every
> >      >      >      >     time step because I'm just using the linear
> solver
> >      >     with a Mat
> >      >      >     and a Vec
> >      >      >      >     data structure). I don't understand how I can
> >     do that.
> >      >      >      >
> >      >      >      >
> >      >      >      > Any program you write to do linear algebra will have
> >      >     contiguous
> >      >      >     storage
> >      >      >      > because it
> >      >      >      > is so much faster. Contiguous indexing makes sense
> for
> >      >     contiguous
> >      >      >      > storage. If you
> >      >      >      > want to use non-contiguous indexing for contiguous
> >      >     storage, you
> >      >      >     would
> >      >      >      > need some
> >      >      >      > translation layer. The AO is such a translation,
> >     but you
> >      >     could do
> >      >      >     this
> >      >      >      > any way you want.
> >      >      >      >
> >      >      >      >    Thanks,
> >      >      >      >
> >      >      >      >       Matt
> >      >      >      >
> >      >      >      >     My initial idea was to create another global
> index
> >      >     ordering
> >      >      >     within my
> >      >      >      >     application to use only for the Petsc interface
> >     but then I
> >      >      >     think that
> >      >      >      >     the ghost cells are wrong.
> >      >      >      >
> >      >      >      >     On 19/10/2023 14:50, Matthew Knepley wrote:
> >      >      >      >      > On Thu, Oct 19, 2023 at 6:51?AM Enrico
> >      >     <degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
> >      >      >      >     <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
> >      >      >      >      > <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
> >      >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>> wrote:
> >      >      >      >      >
> >      >      >      >      >     Hello,
> >      >      >      >      >
> >      >      >      >      >     if I create an application ordering using
> >      >      >     AOCreateBasic, should I
> >      >      >      >      >     provide the same array for const PetscInt
> >      >     myapp[] and
> >      >      >     const
> >      >      >      >     PetscInt
> >      >      >      >      >     mypetsc[] in order to get the same
> >     ordering of the
> >      >      >     application
> >      >      >      >      >     within PETSC?
> >      >      >      >      >
> >      >      >      >      >
> >      >      >      >      > Are you asking if the identity permutation
> >     can be
> >      >     constructed
> >      >      >      >     using the
> >      >      >      >      > same array twice? Yes.
> >      >      >      >      >
> >      >      >      >      >     And once I define the ordering so that
> >     the local
> >      >      >     vector and
> >      >      >      >     matrix are
> >      >      >      >      >     defined in PETSC as in my application,
> >     how can
> >      >     I use it to
> >      >      >      >     create the
> >      >      >      >      >     actual vector and matrix?
> >      >      >      >      >
> >      >      >      >      >
> >      >      >      >      > The vectors and matrices do not change. The
> >     AO is a
> >      >      >     permutation.
> >      >      >      >     You can
> >      >      >      >      > use it to permute
> >      >      >      >      > a vector into another order, or to convert
> >     on index to
> >      >      >     another.
> >      >      >      >      >
> >      >      >      >      >    Thanks,
> >      >      >      >      >
> >      >      >      >      >        Matt
> >      >      >      >      >
> >      >      >      >      >     Thanks in advance for the help.
> >      >      >      >      >
> >      >      >      >      >     Cheers,
> >      >      >      >      >     Enrico
> >      >      >      >      >
> >      >      >      >      >     On 18/10/2023 13:39, Matthew Knepley
> wrote:
> >      >      >      >      >      > On Wed, Oct 18, 2023 at 5:55?AM Enrico
> >      >      >     <degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
> >      >      >      >     <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
> >      >      >      >      >     <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
> >      >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>
> >      >      >      >      >      > <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
> >      >      >      >     <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
> >     <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
> >      >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
> >      >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
> >     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>>> wrote:
> >      >      >      >      >      >
> >      >      >      >      >      >     Hello,
> >      >      >      >      >      >
> >      >      >      >      >      >     I'm trying to use Petsc to solve
> >     a linear
> >      >      >     system in an
> >      >      >      >      >     application. I'm
> >      >      >      >      >      >     using the coordinate format to
> >     define the
> >      >      >     matrix and the
> >      >      >      >      >     vector (it
> >      >      >      >      >      >     should work better on GPU but at
> >     the moment
> >      >      >     every test
> >      >      >      >     is on
> >      >      >      >      >     CPU).
> >      >      >      >      >      >     After
> >      >      >      >      >      >     the call to VecSetValuesCOO, I've
> >      >     noticed that the
> >      >      >      >     vector is
> >      >      >      >      >     storing
> >      >      >      >      >      >     the
> >      >      >      >      >      >     data in a different way from my
> >      >     application. For
> >      >      >      >     example with two
> >      >      >      >      >      >     processes in the application
> >      >      >      >      >      >
> >      >      >      >      >      >     process 0 owns cells 2, 3, 4
> >      >      >      >      >      >
> >      >      >      >      >      >     process 1 owns cells 0, 1, 5
> >      >      >      >      >      >
> >      >      >      >      >      >     But in the vector data structure
> >     of Petsc
> >      >      >      >      >      >
> >      >      >      >      >      >     process 0 owns cells 0, 1, 2
> >      >      >      >      >      >
> >      >      >      >      >      >     process 1 owns cells 3, 4, 5
> >      >      >      >      >      >
> >      >      >      >      >      >     This is in principle not a big
> issue,
> >      >     but after
> >      >      >      >     solving the
> >      >      >      >      >     linear
> >      >      >      >      >      >     system I get the solution vector
> >     x and I
> >      >     want
> >      >      >     to get the
> >      >      >      >      >     values in the
> >      >      >      >      >      >     correct processes. Is there a way
> >     to get
> >      >     vector
> >      >      >     values
> >      >      >      >     from other
> >      >      >      >      >      >     processes or to get a mapping so
> >     that I
> >      >     can do
> >      >      >     it myself?
> >      >      >      >      >      >
> >      >      >      >      >      >
> >      >      >      >      >      > By definition, PETSc vectors and
> >     matrices own
> >      >      >     contiguous row
> >      >      >      >      >     blocks. If
> >      >      >      >      >      > you want to have another,
> >      >      >      >      >      > global ordering, we support that with
> >      >      >      >      >      >
> >     https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>
> >      >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>>
> >      >      >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>
> >      >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>>>
> >      >      >      >      >      >
> >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>
> >      >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>>
> >      >      >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>
> >      >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>
> >      >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>
> >      >     <https://petsc.org/main/manualpages/AO/
> >     <https://petsc.org/main/manualpages/AO/>>>>>>
> >      >      >      >      >      >
> >      >      >      >      >      >    Thanks,
> >      >      >      >      >      >
> >      >      >      >      >      >       Matt
> >      >      >      >      >      >
> >      >      >      >      >      >     Cheers,
> >      >      >      >      >      >     Enrico Degregori
> >      >      >      >      >      >
> >      >      >      >      >      >
> >      >      >      >      >      >
> >      >      >      >      >      > --
> >      >      >      >      >      > What most experimenters take for
> >     granted before
> >      >      >     they begin
> >      >      >      >     their
> >      >      >      >      >      > experiments is infinitely more
> >     interesting
> >      >     than any
> >      >      >      >     results to which
> >      >      >      >      >      > their experiments lead.
> >      >      >      >      >      > -- Norbert Wiener
> >      >      >      >      >      >
> >      >      >      >      >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>>
> >      >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>>>
> >      >      >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>>
> >      >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>>>>
> >      >      >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>
> >      >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>>
> >      >      >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>
> >      >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>>>>
> >      >      >      >      >
> >      >      >      >      >
> >      >      >      >      >
> >      >      >      >      > --
> >      >      >      >      > What most experimenters take for granted
> before
> >      >     they begin
> >      >      >     their
> >      >      >      >      > experiments is infinitely more interesting
> >     than any
> >      >      >     results to which
> >      >      >      >      > their experiments lead.
> >      >      >      >      > -- Norbert Wiener
> >      >      >      >      >
> >      >      >      >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>>
> >      >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>>>
> >      >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>
> >      >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>>>
> >      >      >      >
> >      >      >      >
> >      >      >      >
> >      >      >      > --
> >      >      >      > What most experimenters take for granted before
> >     they begin
> >      >     their
> >      >      >      > experiments is infinitely more interesting than any
> >      >     results to which
> >      >      >      > their experiments lead.
> >      >      >      > -- Norbert Wiener
> >      >      >      >
> >      >      >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >      >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>>
> >      >      >
> >      >      >
> >      >      >
> >      >      > --
> >      >      > What most experimenters take for granted before they begin
> >     their
> >      >      > experiments is infinitely more interesting than any
> >     results to which
> >      >      > their experiments lead.
> >      >      > -- Norbert Wiener
> >      >      >
> >      >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >      >     <https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>
> >      >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>>
> >      >
> >      >
> >      >
> >      > --
> >      > What most experimenters take for granted before they begin their
> >      > experiments is infinitely more interesting than any results to
> which
> >      > their experiments lead.
> >      > -- Norbert Wiener
> >      >
> >      > https://www.cse.buffalo.edu/~knepley/
> >     <https://www.cse.buffalo.edu/~knepley/>
> >     <http://www.cse.buffalo.edu/~knepley/
> >     <http://www.cse.buffalo.edu/~knepley/>>
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which
> > their experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/ <
> http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231019/aab6103c/attachment-0001.html>

From jorgenin at mit.edu  Thu Oct 19 14:46:35 2023
From: jorgenin at mit.edu (Jorge Nin)
Date: Thu, 19 Oct 2023 19:46:35 +0000
Subject: [petsc-users] Performance of Conda Binary vs Self Compiled Version
Message-ID: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu>

Hi,
I was playing around with a self compiled version and, and a the Conda binary of Petsc on the same problem, on my M1 Mac.
Interestingly I found that the Conda binary solves the problem 2-3 times slower vs the self compiled version. (For context I?m using the petsc4py python interface) 

I?ve attached two log views to show the comparison.

I was mostly curious about the possible cause for this.
 I was also curious how I could use my own compiled version of PETSc in my Conda install? 


Best,
Jorge

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Selfcompiled.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231019/81bc70c6/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Conda Version.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231019/81bc70c6/attachment-0003.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1862 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231019/81bc70c6/attachment-0001.p7s>

From knepley at gmail.com  Thu Oct 19 15:00:47 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 19 Oct 2023 16:00:47 -0400
Subject: [petsc-users] Performance of Conda Binary vs Self Compiled
 Version
In-Reply-To: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu>
References: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu>
Message-ID: <CAMYG4GkzmVkucywgQiLrqcA3PpuyGFf7cuYGcMJMQBC2-zvpBw@mail.gmail.com>

On Thu, Oct 19, 2023 at 3:54?PM Jorge Nin <jorgenin at mit.edu> wrote:

> Hi,
> I was playing around with a self compiled version and, and a the Conda
> binary of Petsc on the same problem, on my M1 Mac.
> Interestingly I found that the Conda binary solves the problem 2-3 times
> slower vs the self compiled version. (For context I?m using the petsc4py
> python interface)
>
> I?ve attached two log views to show the comparison.
>
> I was mostly curious about the possible cause for this.
>

All the time is in the LU numeric factorization. I don't know if your
matrix is sparse or dense. I am guessing it is dense and different LAPACK
implementations are linked. If it is sparse, then the compiler options are
different between builds, but I would be surprised if it made this much
difference.

  Thanks,

     Matt


>  I was also curious how I could use my own compiled version of PETSc in my
> Conda install?
>
>
> Best,
> Jorge
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231019/f41defff/attachment.html>

From jorgenin at mit.edu  Thu Oct 19 19:35:12 2023
From: jorgenin at mit.edu (Jorge Nin)
Date: Fri, 20 Oct 2023 00:35:12 +0000
Subject: [petsc-users] Performance of Conda Binary vs Self Compiled
 Version
In-Reply-To: <CAMYG4GkzmVkucywgQiLrqcA3PpuyGFf7cuYGcMJMQBC2-zvpBw@mail.gmail.com>
References: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu>
	<CAMYG4GkzmVkucywgQiLrqcA3PpuyGFf7cuYGcMJMQBC2-zvpBw@mail.gmail.com>
Message-ID: <92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu>

Hi Mathew,

Thanks for the response. It actually seems like the matrix is very sparse (0.99% sparsity from what I?m measuring). It?s an FEA solver so it would make sense.
My current guess is the optimization flags are making a large difference for the M1 Mac, but I am also surprised it makes such a huge difference.

It?s why I was asking if there was a resource or another to use my own version of PETSc with Conda.

I believe a 2-3 x speed up is worth the hassle. 


Best,
Jorge


> On Oct 19, 2023, at 4:00?PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Thu, Oct 19, 2023 at 3:54?PM Jorge Nin <jorgenin at mit.edu <mailto:jorgenin at mit.edu>> wrote:
>> Hi,
>> I was playing around with a self compiled version and, and a the Conda binary of Petsc on the same problem, on my M1 Mac.
>> Interestingly I found that the Conda binary solves the problem 2-3 times slower vs the self compiled version. (For context I?m using the petsc4py python interface) 
>> 
>> I?ve attached two log views to show the comparison.
>> 
>> I was mostly curious about the possible cause for this.
> 
> All the time is in the LU numeric factorization. I don't know if your matrix is sparse or dense. I am guessing it is dense and different LAPACK implementations are linked. If it is sparse, then the compiler options are different between builds, but I would be surprised if it made this much difference.
> 
>   Thanks,
> 
>      Matt
>  
>>  I was also curious how I could use my own compiled version of PETSc in my Conda install? 
>> 
>> 
>> Best,
>> Jorge
>> 
> 
> 
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231020/5743c79c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1862 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231020/5743c79c/attachment.p7s>

From degregori at dkrz.de  Fri Oct 20 02:05:41 2023
From: degregori at dkrz.de (Enrico)
Date: Fri, 20 Oct 2023 09:05:41 +0200
Subject: [petsc-users] Coordinate format internal reordering
In-Reply-To: <CAMYG4G=qEjwJHLtjQrwo_A06-3GPtpHjbDW8DdJ6Wou7SjoC2w@mail.gmail.com>
References: <1e2044e9-d1ca-42b5-0377-495a1425af46@dkrz.de>
	<CAMYG4Gk8XR0KEXtWFtRk_qQ4k7fkcMtkdzsXLJGObjNz3eBUfw@mail.gmail.com>
	<5d27c8e3-0d15-b816-0f61-cfad36d0cefa@dkrz.de>
	<CAMYG4G=HWVYOpjVb1f6kSnUvo5xb2H+5hwVRJpEw3aOC254QVQ@mail.gmail.com>
	<1dfc73c6-babd-f27a-6ee1-aef9cfd9c3ae@dkrz.de>
	<CAMYG4GnVPK1E8fR2yNaRPedHRCWcW983s38sU780vQuZ6YWHLw@mail.gmail.com>
	<81f45928-dd0c-4497-00bf-215c8b055b5b@dkrz.de>
	<CAMYG4G=_-H8APEAzniRELMoXtv1_HVgrNiJT=ncrrBoDrc5e-Q@mail.gmail.com>
	<fbb5dbe9-02e3-bac4-8ee0-f4aef4c7af3b@dkrz.de>
	<CAMYG4Gk7m8AVxouB=BgY1DAdyU84_gG+C=tOSrnMTc=YAcjzPA@mail.gmail.com>
	<84025e0f-62d8-0fd8-b9cb-1f279e22703c@dkrz.de>
	<CAMYG4Gmjqj9Dkis9wvdgALJeKcEZQEtQ0MOaF+K0iZcH3O9u-w@mail.gmail.com>
	<3e38435f-0f57-1f1f-20ca-a66d529a0387@dkrz.de>
	<CAMYG4G=qEjwJHLtjQrwo_A06-3GPtpHjbDW8DdJ6Wou7SjoC2w@mail.gmail.com>
Message-ID: <17f581e3-ebd1-0c2e-4cb8-4810ba811ac2@dkrz.de>

Yes but in the documentation it should be clear that if the global 
indices are not contiguous in each process, even if the memory is 
locally contiguous in each process, then there will be communication to 
build the matrix and the vector.

Cheers,
Enrico

On 19/10/2023 20:41, Matthew Knepley wrote:
> On Thu, Oct 19, 2023 at 1:46?PM Enrico <degregori at dkrz.de 
> <mailto:degregori at dkrz.de>> wrote:
> 
>     Sorry but I don't want another partition, Petsc internally is changing
>     the partition. I would like to have the same partition that I have in
>     the application. Is the example not clear?
> 
> 
> "Petsc internally is changing the partition" This is not correct. PETSc does
> not prescribe partitions. I refer to the documentation for the creation 
> of Vec:
> 
> https://petsc.org/main/manualpages/Vec/VecCreateMPI/ 
> <https://petsc.org/main/manualpages/Vec/VecCreateMPI/>
> 
> Here the user sets the local and global sizes. Since data is contiguous, 
> these
> completely define the vector, and are under user control.
> 
>  ? Thanks,
> 
>  ? ? ?Matt
> 
>     On 19/10/2023 19:43, Matthew Knepley wrote:
>      > On Thu, Oct 19, 2023 at 1:00?PM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>> wrote:
>      >
>      >? ? ?Here is a very very simple reproducer of my problem. It is a
>     fortran
>      >? ? ?program and it has to run with 2 processes.
>      >
>      >
>      > You seem to be saying that you start with one partition of your
>     data,
>      > but you would like
>      > another partition. For this, you have to initially communicate.
>     For this
>      > I would use VecScatter.
>      > However, since most data is generated, I would consider not
>     generating
>      > my data in that initial
>      > distribution.
>      >
>      > There are many examples in the repository. In the discretization
>     of a
>      > PDE, we first?divide the domain,
>      > then number up each piece, then assemble the linear algebra objects.
>      >
>      >? ? Thanks,
>      >
>      >? ? ? ? Matt
>      >
>      >? ? ?The output is:
>      >
>      >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 1 ) =   
>      ?0.000000000000000
>      >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 2 ) =   
>      ?1.000000000000000
>      >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 3 ) =   
>      ?2.000000000000000
>      >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 1 ) =   
>      ?3.000000000000000
>      >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 2 ) =   
>      ?4.000000000000000
>      >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 3 ) =   
>      ?5.000000000000000
>      >
>      >? ? ?and I would like to have:
>      >
>      >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 1 ) =   
>      ?2.000000000000000
>      >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 2 ) =   
>      ?3.000000000000000
>      >? ? ? ? process? ? ? ? ? ? ?0 : xx_v(? ? ? ? ? ? 3 ) =   
>      ?4.000000000000000
>      >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 1 ) =   
>      ?0.000000000000000
>      >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 2 ) =   
>      ?1.000000000000000
>      >? ? ? ? process? ? ? ? ? ? ?1 : xx_v(? ? ? ? ? ? 3 ) =   
>      ?5.000000000000000
>      >
>      >? ? ?How can I do that?
>      >
>      >? ? ?program main
>      >? ? ?#include <petsc/finclude/petscksp.h>
>      >? ? ? ? ? ?use petscksp
>      >? ? ? ? ? ?implicit none
>      >
>      >? ? ? ? ? ?PetscErrorCode ierr
>      >? ? ? ? ? ?PetscInt? :: Psize = 6
>      >? ? ? ? ? ?integer? :: Lsize
>      >? ? ? ? ? ?PetscInt? :: work_size
>      >? ? ? ? ? ?PetscInt? :: work_rank
>      >? ? ? ? ? ?Vec :: b
>      >? ? ? ? ? ?integer, allocatable, dimension(:) :: glb_index
>      >? ? ? ? ? ?double precision, allocatable, dimension(:) :: array
>      >? ? ? ? ? ?PetscScalar, pointer :: xx_v(:)
>      >? ? ? ? ? ?integer :: i
>      >? ? ? ? ? ?PetscCount :: csize
>      >
>      >? ? ? ? ? ?CALL PetscInitialize(ierr)
>      >
>      >? ? ? ? ? ?Lsize = 3
>      >? ? ? ? ? ?csize = Lsize
>      >
>      >? ? ? ? ? ?allocate(glb_index(0:Lsize-1), array(0:Lsize-1))
>      >
>      >? ? ? ? ? ?CALL MPI_Comm_size(PETSC_COMM_WORLD, work_size, ierr);
>      >? ? ? ? ? ?CALL MPI_Comm_rank(PETSC_COMM_WORLD, work_rank, ierr);
>      >? ? ? ? ? ?if (work_rank == 0) then
>      >? ? ? ? ? ? ?glb_index(0) = 2
>      >? ? ? ? ? ? ?glb_index(1) = 3
>      >? ? ? ? ? ? ?glb_index(2) = 4
>      >? ? ? ? ? ? ?array(0) = 2
>      >? ? ? ? ? ? ?array(1) = 3
>      >? ? ? ? ? ? ?array(2) = 4
>      >? ? ? ? ? ?else if (work_rank == 1) then
>      >? ? ? ? ? ? ?glb_index(0) = 0
>      >? ? ? ? ? ? ?glb_index(1) = 1
>      >? ? ? ? ? ? ?glb_index(2) = 5
>      >? ? ? ? ? ? ?array(0) = 0
>      >? ? ? ? ? ? ?array(1) = 1
>      >? ? ? ? ? ? ?array(2) = 5
>      >? ? ? ? ? ?end if
>      >
>      >? ? ? ? ? ?! Create and fill rhs vector
>      >? ? ? ? ? ?CALL VecCreate(PETSC_COMM_WORLD, b, ierr);
>      >? ? ? ? ? ?CALL VecSetSizes(b, Lsize, Psize, ierr);
>      >? ? ? ? ? ?CALL VecSetType(b, VECMPI, ierr);
>      >
>      >? ? ? ? ? ?CALL VecSetPreallocationCOO(b, csize, glb_index, ierr)
>      >? ? ? ? ? ?CALL VecSetValuesCOO(b, array, INSERT_VALUES, ierr)
>      >
>      >? ? ? ? ? ?CALL VecGetArrayReadF90(b, xx_v, ierr)
>      >
>      >? ? ? ? ? ?do i=1,Lsize
>      >? ? ? ? ? ? ?write(*,*) 'process ', work_rank, ': xx_v(',i,') = ',
>     xx_v(i)
>      >? ? ? ? ? ?end do
>      >
>      >? ? ? ? ? ?CALL VecRestoreArrayReadF90(b, xx_v, ierr)
>      >
>      >? ? ? ? ? ?deallocate(glb_index, array)
>      >? ? ? ? ? ?CALL VecDestroy(b,ierr)
>      >
>      >? ? ? ? ? ?CALL PetscFinalize(ierr)
>      >
>      >? ? ?end program main
>      >
>      >
>      >? ? ?On 19/10/2023 17:36, Matthew Knepley wrote:
>      >? ? ? > On Thu, Oct 19, 2023 at 11:33?AM Enrico <degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? > <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>> wrote:
>      >? ? ? >
>      >? ? ? >? ? ?The layout is not poor, just the global indices are not
>      >? ? ?contiguous,this
>      >? ? ? >? ? ?has nothing to do with the local memory layout which
>     is extremely
>      >? ? ? >? ? ?optimized for different architectures. I can not
>     change the
>      >? ? ?layout
>      >? ? ? >? ? ?anyway because it's a climate model with a million
>     lines of code.
>      >? ? ? >
>      >? ? ? >? ? ?I don't understand why Petsc is doing all this MPI
>      >? ? ?communication under
>      >? ? ? >? ? ?the hood.
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > I don't think we are communicating under?the hood.
>      >? ? ? >
>      >? ? ? >? ? ?I mean, it is changing the layout of the application
>     and doing
>      >? ? ? >? ? ?a lot of communication.
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > We do not create the layout. The user creates the data layout
>      >? ? ?when they
>      >? ? ? > create a vector or matrix.
>      >? ? ? >
>      >? ? ? >? ? ?Is there no way to force the same layout and
>      >? ? ? >? ? ?provide info about how to do the halo exchange? In
>     this way I
>      >? ? ?can have
>      >? ? ? >? ? ?the same memory layout and there is no communication
>     when I
>      >? ? ?fill or
>      >? ? ? >? ? ?fetch the vectors and the matrix.
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > Yes, you tell the vector/matrix your data layout when you
>     create it.
>      >? ? ? >
>      >? ? ? >? ? Thanks,
>      >? ? ? >
>      >? ? ? >? ? ? ? Matt
>      >? ? ? >
>      >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ?Enrico
>      >? ? ? >
>      >? ? ? >? ? ?On 19/10/2023 17:21, Matthew Knepley wrote:
>      >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 10:51?AM Enrico
>     <degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>> wrote:
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?In the application the storage is contiguous
>     but the
>      >? ? ?global
>      >? ? ? >? ? ?indexing is
>      >? ? ? >? ? ? >? ? ?not. I would like to use AO as a translation
>     layer but
>      >? ? ?I don't
>      >? ? ? >? ? ? >? ? ?understand it.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > Why would you choose to index differently from your
>     storage?
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?My case is actually simple even if it is in a large
>      >? ? ? >? ? ?application, I have
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Mat A, Vec b and Vec x
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?After calling KSPSolve, I use
>     VecGetArrayReadF90 to get a
>      >? ? ? >? ? ?pointer to
>      >? ? ? >? ? ? >? ? ?the
>      >? ? ? >? ? ? >? ? ?data and they are in the wrong ordering, so for
>      >? ? ?example the first
>      >? ? ? >? ? ? >? ? ?element of the solution array on process 0
>     belongs to
>      >? ? ?process
>      >? ? ? >? ? ?1 in the
>      >? ? ? >? ? ? >? ? ?application.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > Again, this seems to be a poor choice of layout.
>     What we
>      >? ? ? >? ? ?typically do is
>      >? ? ? >? ? ? > to partition
>      >? ? ? >? ? ? > the data into chunks owned by each process first.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Is it at this point that I should use the AO
>     translation
>      >? ? ? >? ? ?layer? This
>      >? ? ? >? ? ? >? ? ?would be quite bad, it means to build Mat A and
>     Vec b
>      >? ? ?there
>      >? ? ? >? ? ?is MPI
>      >? ? ? >? ? ? >? ? ?communication and also to get the data of Vec x
>     back
>      >? ? ?in the
>      >? ? ? >? ? ?application.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > If you want to store data that process i updates on
>     process j,
>      >? ? ? >? ? ?this will
>      >? ? ? >? ? ? > need communication.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Anyway, I've tried to use
>      >? ? ?AOPetscToApplicationPermuteReal on the
>      >? ? ? >? ? ? >? ? ?solution array but it doesn't work as I would
>     like. Is
>      >? ? ?this
>      >? ? ? >? ? ?function
>      >? ? ? >? ? ? >? ? ?suppose to do MPI communication between
>     processes and
>      >? ? ?fetch
>      >? ? ? >? ? ?the values
>      >? ? ? >? ? ? >? ? ?of the application ordering?
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > There is no communication here. That function call just
>      >? ? ?changes one
>      >? ? ? >? ? ? > integer into another.
>      >? ? ? >? ? ? > If you want to update values on another process, we
>      >? ? ?recommend using
>      >? ? ? >? ? ? > VecScatter() or
>      >? ? ? >? ? ? > MatSetValues(), both of which take global indices
>     and do
>      >? ? ? >? ? ?communication
>      >? ? ? >? ? ? > if necessary.
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? Matt
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ? >? ? ?Enrico
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ?On 19/10/2023 15:25, Matthew Knepley wrote:
>      >? ? ? >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 8:57?AM Enrico
>      >? ? ?<degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>> wrote:
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?Maybe I wasn't clear enough. I would like to
>      >? ? ? >? ? ?completely get
>      >? ? ? >? ? ? >? ? ?rid of
>      >? ? ? >? ? ? >? ? ? >? ? ?Petsc
>      >? ? ? >? ? ? >? ? ? >? ? ?ordering because I don't want extra
>      >? ? ?communication between
>      >? ? ? >? ? ? >? ? ?processes to
>      >? ? ? >? ? ? >? ? ? >? ? ?construct the vector and the matrix (since I
>      >? ? ?have to fill
>      >? ? ? >? ? ? >? ? ?them every
>      >? ? ? >? ? ? >? ? ? >? ? ?time step because I'm just using the
>     linear solver
>      >? ? ? >? ? ?with a Mat
>      >? ? ? >? ? ? >? ? ?and a Vec
>      >? ? ? >? ? ? >? ? ? >? ? ?data structure). I don't understand how
>     I can
>      >? ? ?do that.
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > Any program you write to do linear algebra
>     will have
>      >? ? ? >? ? ?contiguous
>      >? ? ? >? ? ? >? ? ?storage
>      >? ? ? >? ? ? >? ? ? > because it
>      >? ? ? >? ? ? >? ? ? > is so much faster. Contiguous indexing makes
>     sense for
>      >? ? ? >? ? ?contiguous
>      >? ? ? >? ? ? >? ? ? > storage. If you
>      >? ? ? >? ? ? >? ? ? > want to use non-contiguous indexing for
>     contiguous
>      >? ? ? >? ? ?storage, you
>      >? ? ? >? ? ? >? ? ?would
>      >? ? ? >? ? ? >? ? ? > need some
>      >? ? ? >? ? ? >? ? ? > translation layer. The AO is such a translation,
>      >? ? ?but you
>      >? ? ? >? ? ?could do
>      >? ? ? >? ? ? >? ? ?this
>      >? ? ? >? ? ? >? ? ? > any way you want.
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? ?Matt
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?My initial idea was to create another
>     global index
>      >? ? ? >? ? ?ordering
>      >? ? ? >? ? ? >? ? ?within my
>      >? ? ? >? ? ? >? ? ? >? ? ?application to use only for the Petsc
>     interface
>      >? ? ?but then I
>      >? ? ? >? ? ? >? ? ?think that
>      >? ? ? >? ? ? >? ? ? >? ? ?the ghost cells are wrong.
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ?On 19/10/2023 14:50, Matthew Knepley wrote:
>      >? ? ? >? ? ? >? ? ? >? ? ? > On Thu, Oct 19, 2023 at 6:51?AM Enrico
>      >? ? ? >? ? ?<degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>>> wrote:
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Hello,
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?if I create an application
>     ordering using
>      >? ? ? >? ? ? >? ? ?AOCreateBasic, should I
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?provide the same array for const
>     PetscInt
>      >? ? ? >? ? ?myapp[] and
>      >? ? ? >? ? ? >? ? ?const
>      >? ? ? >? ? ? >? ? ? >? ? ?PetscInt
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?mypetsc[] in order to get the same
>      >? ? ?ordering of the
>      >? ? ? >? ? ? >? ? ?application
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?within PETSC?
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? > Are you asking if the identity
>     permutation
>      >? ? ?can be
>      >? ? ? >? ? ?constructed
>      >? ? ? >? ? ? >? ? ? >? ? ?using the
>      >? ? ? >? ? ? >? ? ? >? ? ? > same array twice? Yes.
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?And once I define the ordering so
>     that
>      >? ? ?the local
>      >? ? ? >? ? ? >? ? ?vector and
>      >? ? ? >? ? ? >? ? ? >? ? ?matrix are
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?defined in PETSC as in my
>     application,
>      >? ? ?how can
>      >? ? ? >? ? ?I use it to
>      >? ? ? >? ? ? >? ? ? >? ? ?create the
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?actual vector and matrix?
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? > The vectors and matrices do not
>     change. The
>      >? ? ?AO is a
>      >? ? ? >? ? ? >? ? ?permutation.
>      >? ? ? >? ? ? >? ? ? >? ? ?You can
>      >? ? ? >? ? ? >? ? ? >? ? ? > use it to permute
>      >? ? ? >? ? ? >? ? ? >? ? ? > a vector into another order, or to
>     convert
>      >? ? ?on index to
>      >? ? ? >? ? ? >? ? ?another.
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? ? Matt
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Thanks in advance for the help.
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Enrico
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?On 18/10/2023 13:39, Matthew
>     Knepley wrote:
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > On Wed, Oct 18, 2023 at
>     5:55?AM Enrico
>      >? ? ? >? ? ? >? ? ?<degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>
>      >? ? ? >? ? ? >? ? ?<mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de> <mailto:degregori at dkrz.de
>     <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>
>      >? ? ? >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>
>      >? ? ?<mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>
>     <mailto:degregori at dkrz.de <mailto:degregori at dkrz.de>>>>>>>> wrote:
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Hello,
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?I'm trying to use Petsc to
>     solve
>      >? ? ?a linear
>      >? ? ? >? ? ? >? ? ?system in an
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?application. I'm
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?using the coordinate format to
>      >? ? ?define the
>      >? ? ? >? ? ? >? ? ?matrix and the
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?vector (it
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?should work better on GPU
>     but at
>      >? ? ?the moment
>      >? ? ? >? ? ? >? ? ?every test
>      >? ? ? >? ? ? >? ? ? >? ? ?is on
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?CPU).
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?After
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?the call to
>     VecSetValuesCOO, I've
>      >? ? ? >? ? ?noticed that the
>      >? ? ? >? ? ? >? ? ? >? ? ?vector is
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?storing
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?the
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?data in a different way
>     from my
>      >? ? ? >? ? ?application. For
>      >? ? ? >? ? ? >? ? ? >? ? ?example with two
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?processes in the application
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 2, 3, 4
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 0, 1, 5
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?But in the vector data
>     structure
>      >? ? ?of Petsc
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 0 owns cells 0, 1, 2
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?process 1 owns cells 3, 4, 5
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?This is in principle not a
>     big issue,
>      >? ? ? >? ? ?but after
>      >? ? ? >? ? ? >? ? ? >? ? ?solving the
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?linear
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?system I get the solution
>     vector
>      >? ? ?x and I
>      >? ? ? >? ? ?want
>      >? ? ? >? ? ? >? ? ?to get the
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?values in the
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?correct processes. Is
>     there a way
>      >? ? ?to get
>      >? ? ? >? ? ?vector
>      >? ? ? >? ? ? >? ? ?values
>      >? ? ? >? ? ? >? ? ? >? ? ?from other
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?processes or to get a
>     mapping so
>      >? ? ?that I
>      >? ? ? >? ? ?can do
>      >? ? ? >? ? ? >? ? ?it myself?
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > By definition, PETSc vectors and
>      >? ? ?matrices own
>      >? ? ? >? ? ? >? ? ?contiguous row
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?blocks. If
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > you want to have another,
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > global ordering, we support
>     that with
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      > https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >   
>      ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >   
>      ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>
>      >? ? ? >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>
>      >? ? ? >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>
>      >? ? ?<https://petsc.org/main/manualpages/AO/
>     <https://petsc.org/main/manualpages/AO/>>>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? Thanks,
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? ?Matt
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Cheers,
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >? ? ?Enrico Degregori
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > --
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > What most experimenters take for
>      >? ? ?granted before
>      >? ? ? >? ? ? >? ? ?they begin
>      >? ? ? >? ? ? >? ? ? >? ? ?their
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > experiments is infinitely more
>      >? ? ?interesting
>      >? ? ? >? ? ?than any
>      >? ? ? >? ? ? >? ? ? >? ? ?results to which
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >? ? ? >
>     https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >   
>      ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >   
>      ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >   
>      ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? > --
>      >? ? ? >? ? ? >? ? ? >? ? ? > What most experimenters take for
>     granted before
>      >? ? ? >? ? ?they begin
>      >? ? ? >? ? ? >? ? ?their
>      >? ? ? >? ? ? >? ? ? >? ? ? > experiments is infinitely more
>     interesting
>      >? ? ?than any
>      >? ? ? >? ? ? >? ? ?results to which
>      >? ? ? >? ? ? >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>>>
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > --
>      >? ? ? >? ? ? >? ? ? > What most experimenters take for granted before
>      >? ? ?they begin
>      >? ? ? >? ? ?their
>      >? ? ? >? ? ? >? ? ? > experiments is infinitely more interesting
>     than any
>      >? ? ? >? ? ?results to which
>      >? ? ? >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >? ? ? >
>      >? ? ? >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>>
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > --
>      >? ? ? >? ? ? > What most experimenters take for granted before
>     they begin
>      >? ? ?their
>      >? ? ? >? ? ? > experiments is infinitely more interesting than any
>      >? ? ?results to which
>      >? ? ? >? ? ? > their experiments lead.
>      >? ? ? >? ? ? > -- Norbert Wiener
>      >? ? ? >? ? ? >
>      >? ? ? >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
>      >? ? ? >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>>
>      >? ? ? >
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > --
>      >? ? ? > What most experimenters take for granted before they begin
>     their
>      >? ? ? > experiments is infinitely more interesting than any
>     results to which
>      >? ? ? > their experiments lead.
>      >? ? ? > -- Norbert Wiener
>      >? ? ? >
>      >? ? ? > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>
>      >? ? ?<http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>>
>      >
>      >
>      >
>      > --
>      > What most experimenters take for granted before they begin their
>      > experiments is infinitely more interesting than any results to which
>      > their experiments lead.
>      > -- Norbert Wiener
>      >
>      > https://www.cse.buffalo.edu/~knepley/
>     <https://www.cse.buffalo.edu/~knepley/>
>     <http://www.cse.buffalo.edu/~knepley/
>     <http://www.cse.buffalo.edu/~knepley/>>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

From knepley at gmail.com  Fri Oct 20 06:12:30 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 20 Oct 2023 07:12:30 -0400
Subject: [petsc-users] Performance of Conda Binary vs Self Compiled
 Version
In-Reply-To: <92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu>
References: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu>
	<CAMYG4GkzmVkucywgQiLrqcA3PpuyGFf7cuYGcMJMQBC2-zvpBw@mail.gmail.com>
	<92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu>
Message-ID: <CAMYG4Gng=p2QUdPiv74+FE7vWeJT_DDZXM=6+O8KrwuSwNO+bQ@mail.gmail.com>

On Thu, Oct 19, 2023 at 8:35?PM Jorge Nin <jorgenin at mit.edu> wrote:

> Hi Mathew,
>
> Thanks for the response. It actually seems like the matrix is very sparse (0.99%
> sparsity from what I?m measuring). It?s an FEA solver so it would make
> sense.
> My current guess is the optimization flags are making a large difference
> for the M1 Mac, but I am also surprised it makes such a huge difference.
>
> It?s why I was asking if there was a resource or another to use my own
> version of PETSc with Conda.
>

We do not know how Conda works unfortunately.

  Thanks,

     Matt


> I believe a 2-3 x speed up is worth the hassle.
>
>
> Best,
> Jorge
>
>
>
> On Oct 19, 2023, at 4:00?PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Thu, Oct 19, 2023 at 3:54?PM Jorge Nin <jorgenin at mit.edu> wrote:
>
>> Hi,
>> I was playing around with a self compiled version and, and a the Conda
>> binary of Petsc on the same problem, on my M1 Mac.
>> Interestingly I found that the Conda binary solves the problem 2-3 times
>> slower vs the self compiled version. (For context I?m using the petsc4py
>> python interface)
>>
>> I?ve attached two log views to show the comparison.
>>
>> I was mostly curious about the possible cause for this.
>>
>
> All the time is in the LU numeric factorization. I don't know if your
> matrix is sparse or dense. I am guessing it is dense and different LAPACK
> implementations are linked. If it is sparse, then the compiler options are
> different between builds, but I would be surprised if it made this much
> difference.
>
>   Thanks,
>
>      Matt
>
>
>>  I was also curious how I could use my own compiled version of PETSc in
>> my Conda install?
>>
>>
>> Best,
>> Jorge
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231020/82a74908/attachment.html>

From bsmith at petsc.dev  Fri Oct 20 09:11:43 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 20 Oct 2023 10:11:43 -0400
Subject: [petsc-users] Performance of Conda Binary vs Self Compiled
 Version
In-Reply-To: <CAMYG4Gng=p2QUdPiv74+FE7vWeJT_DDZXM=6+O8KrwuSwNO+bQ@mail.gmail.com>
References: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu>
	<CAMYG4GkzmVkucywgQiLrqcA3PpuyGFf7cuYGcMJMQBC2-zvpBw@mail.gmail.com>
	<92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu>
	<CAMYG4Gng=p2QUdPiv74+FE7vWeJT_DDZXM=6+O8KrwuSwNO+bQ@mail.gmail.com>
Message-ID: <B8045C48-23E9-4320-BABB-55455DEBB5D8@petsc.dev>


> On Oct 20, 2023, at 7:12?AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Thu, Oct 19, 2023 at 8:35?PM Jorge Nin <jorgenin at mit.edu <mailto:jorgenin at mit.edu>> wrote:
>> Hi Mathew,
>> 
>> Thanks for the response. It actually seems like the matrix is very sparse (0.99% sparsity from what I?m measuring). It?s an FEA solver so it would make sense.
>> My current guess is the optimization flags are making a large difference for the M1 Mac, but I am also surprised it makes such a huge difference.
>> 
>> It?s why I was asking if there was a resource or another to use my own version of PETSc with Conda.

  What do you mean by "with Conda"? You can entire the Conda environment, configure and compile PETSc, and then link your code against this PETSc library (instead of the one provided by Conda).  By being in the Conda environment it means it is using the Conda Python, the Conda compilers etc. 

   Barry

Some users have difficulty configuring PETSc inside the Conda environment; if your ./configure or make of PETSc fails just send configure.log (and make.log) to petsc-maint at mcs.anl.gov and we'll figure out how to get it compiled.


> 
> We do not know how Conda works unfortunately.
> 
>   Thanks,
> 
>      Matt
>  
>> I believe a 2-3 x speed up is worth the hassle. 
>> 
>> 
>> Best,
>> Jorge
>> 
>> 
>> 
>>> On Oct 19, 2023, at 4:00?PM, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>> 
>>> On Thu, Oct 19, 2023 at 3:54?PM Jorge Nin <jorgenin at mit.edu <mailto:jorgenin at mit.edu>> wrote:
>>>> Hi,
>>>> I was playing around with a self compiled version and, and a the Conda binary of Petsc on the same problem, on my M1 Mac.
>>>> Interestingly I found that the Conda binary solves the problem 2-3 times slower vs the self compiled version. (For context I?m using the petsc4py python interface) 
>>>> 
>>>> I?ve attached two log views to show the comparison.
>>>> 
>>>> I was mostly curious about the possible cause for this.
>>> 
>>> All the time is in the LU numeric factorization. I don't know if your matrix is sparse or dense. I am guessing it is dense and different LAPACK implementations are linked. If it is sparse, then the compiler options are different between builds, but I would be surprised if it made this much difference.
>>> 
>>>   Thanks,
>>> 
>>>      Matt
>>>  
>>>>  I was also curious how I could use my own compiled version of PETSc in my Conda install? 
>>>> 
>>>> 
>>>> Best,
>>>> Jorge
>>>> 
>>> 
>>> 
>>> --
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>> 
> 
> 
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231020/fdd78caa/attachment.html>

From miaodi1987 at gmail.com  Fri Oct 20 15:19:27 2023
From: miaodi1987 at gmail.com (Di Miao)
Date: Fri, 20 Oct 2023 13:19:27 -0700
Subject: [petsc-users] use MATSEQAIJMKL in 64-bit indices
Message-ID: <CAB1+S10zP1Kz7nu3ggg4xn+J8z-HD0TYwCA5PqneE2V8Xt32XQ@mail.gmail.com>

Hi,

I found that when compiled with '--with-64-bit-indices=1' option, the
following three definitions in petscconf.h will be removed:

#define PETSC_HAVE_MKL_SPARSE 1
#define PETSC_HAVE_MKL_SPARSE_OPTIMIZE 1
#define PETSC_HAVE_MKL_SPARSE_SP2M_FEATURE 1

I believe mkl can also use 64-bit indices (libmkl_intel_ilp64). I tried to
add ' --with-mkl_sparse=1 --with-mkl_sparse_optimize=1' into configuration
but does not succeed.

Would I know if it is possible to use MATSEQAIJMKL matrix type in 64-bit
mode?

Regards,
Di
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231020/b90bc87d/attachment.html>

From balay at mcs.anl.gov  Fri Oct 20 15:52:11 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 20 Oct 2023 15:52:11 -0500 (CDT)
Subject: [petsc-users] use MATSEQAIJMKL in 64-bit indices
In-Reply-To: <CAB1+S10zP1Kz7nu3ggg4xn+J8z-HD0TYwCA5PqneE2V8Xt32XQ@mail.gmail.com>
References: <CAB1+S10zP1Kz7nu3ggg4xn+J8z-HD0TYwCA5PqneE2V8Xt32XQ@mail.gmail.com>
Message-ID: <5c512d72-fc62-84c3-c7a6-b2848f352c01@mcs.anl.gov>

Try using the additional option --with-64-bit-blas-indices=1

Satish

On Fri, 20 Oct 2023, Di Miao wrote:

> Hi,
> 
> I found that when compiled with '--with-64-bit-indices=1' option, the
> following three definitions in petscconf.h will be removed:
> 
> #define PETSC_HAVE_MKL_SPARSE 1
> #define PETSC_HAVE_MKL_SPARSE_OPTIMIZE 1
> #define PETSC_HAVE_MKL_SPARSE_SP2M_FEATURE 1
> 
> I believe mkl can also use 64-bit indices (libmkl_intel_ilp64). I tried to
> add ' --with-mkl_sparse=1 --with-mkl_sparse_optimize=1' into configuration
> but does not succeed.
> 
> Would I know if it is possible to use MATSEQAIJMKL matrix type in 64-bit
> mode?
> 
> Regards,
> Di
> 


From miaodi1987 at gmail.com  Fri Oct 20 16:50:29 2023
From: miaodi1987 at gmail.com (Di Miao)
Date: Fri, 20 Oct 2023 14:50:29 -0700
Subject: [petsc-users] use MATSEQAIJMKL in 64-bit indices
In-Reply-To: <5c512d72-fc62-84c3-c7a6-b2848f352c01@mcs.anl.gov>
References: <CAB1+S10zP1Kz7nu3ggg4xn+J8z-HD0TYwCA5PqneE2V8Xt32XQ@mail.gmail.com>
	<5c512d72-fc62-84c3-c7a6-b2848f352c01@mcs.anl.gov>
Message-ID: <CAB1+S13zGcz74gMgcf75sWeNX8megKKi3SXOhZXgY_eOFWOkmg@mail.gmail.com>

Thanks, it worked.

Di

On Fri, Oct 20, 2023 at 1:52?PM Satish Balay <balay at mcs.anl.gov> wrote:

> Try using the additional option --with-64-bit-blas-indices=1
>
> Satish
>
> On Fri, 20 Oct 2023, Di Miao wrote:
>
> > Hi,
> >
> > I found that when compiled with '--with-64-bit-indices=1' option, the
> > following three definitions in petscconf.h will be removed:
> >
> > #define PETSC_HAVE_MKL_SPARSE 1
> > #define PETSC_HAVE_MKL_SPARSE_OPTIMIZE 1
> > #define PETSC_HAVE_MKL_SPARSE_SP2M_FEATURE 1
> >
> > I believe mkl can also use 64-bit indices (libmkl_intel_ilp64). I tried
> to
> > add ' --with-mkl_sparse=1 --with-mkl_sparse_optimize=1' into
> configuration
> > but does not succeed.
> >
> > Would I know if it is possible to use MATSEQAIJMKL matrix type in 64-bit
> > mode?
> >
> > Regards,
> > Di
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231020/59f6060f/attachment.html>

From andrsd at gmail.com  Sat Oct 21 07:36:06 2023
From: andrsd at gmail.com (David Andrs)
Date: Sat, 21 Oct 2023 06:36:06 -0600
Subject: [petsc-users] Performance of Conda Binary vs Self Compiled
 Version
In-Reply-To: <92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu>
References: <1515C79D-73E4-4345-A0E1-F47D870CA3E8@Mit.edu>
	<CAMYG4GkzmVkucywgQiLrqcA3PpuyGFf7cuYGcMJMQBC2-zvpBw@mail.gmail.com>
	<92F64F88-3F08-46AE-BDDF-6CD3602AE5D4@Mit.edu>
Message-ID: <F4639378-9871-4958-9D47-3A90724113D7@gmail.com>

Hi Jorge.

On Oct 19, 2023, at 18:35, Jorge Nin <jorgenin at mit.edu> wrote:
> 
> Hi Mathew,
> 
> Thanks for the response. It actually seems like the matrix is very sparse (0.99% sparsity from what I?m measuring). It?s an FEA solver so it would make sense.
> My current guess is the optimization flags are making a large difference for the M1 Mac, but I am also surprised it makes such a huge difference.
> 
> It?s why I was asking if there was a resource or another to use my own version of PETSc with Conda.

If you are using PETSc from conda-forge, then you can go look at https://github.com/conda-forge/petsc-feedstock repo. That is where the conda recipe (i.e. the description of how to build the conda package) is located. Under the `recipe` directory, you will see `build.sh` script which has the configure line, etc. included. You can adapt the recipe to your needs and then deploy your own version to your own conda channel (it would be hosted on anaconda.org <http://anaconda.org/> - you will need an account there). The whole process is quite complex and beyond the scope of this email. conda has a lot of documentation for this, though. This could be a good starting point in case you want to dive into this: https://docs.conda.io/projects/conda-build/en/stable/concepts/recipe.html

--
David

> 
> I believe a 2-3 x speed up is worth the hassle. 
> 
> 
> Best,
> Jorge
> 
> 
> 
>> On Oct 19, 2023, at 4:00?PM, Matthew Knepley <knepley at gmail.com> wrote:
>> 
>> On Thu, Oct 19, 2023 at 3:54?PM Jorge Nin <jorgenin at mit.edu <mailto:jorgenin at mit.edu>> wrote:
>>> Hi,
>>> I was playing around with a self compiled version and, and a the Conda binary of Petsc on the same problem, on my M1 Mac.
>>> Interestingly I found that the Conda binary solves the problem 2-3 times slower vs the self compiled version. (For context I?m using the petsc4py python interface) 
>>> 
>>> I?ve attached two log views to show the comparison.
>>> 
>>> I was mostly curious about the possible cause for this.
>> 
>> All the time is in the LU numeric factorization. I don't know if your matrix is sparse or dense. I am guessing it is dense and different LAPACK implementations are linked. If it is sparse, then the compiler options are different between builds, but I would be surprised if it made this much difference.
>> 
>>   Thanks,
>> 
>>      Matt
>>  
>>>  I was also curious how I could use my own compiled version of PETSc in my Conda install? 
>>> 
>>> 
>>> Best,
>>> Jorge
>>> 
>> 
>> 
>> --
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231021/e197b63c/attachment-0001.html>

From mfadams at lbl.gov  Mon Oct 23 07:15:44 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Mon, 23 Oct 2023 08:15:44 -0400
Subject: [petsc-users] MatSetValue problem (in Fortran)
Message-ID: <CADOhEh6ZGLz7nvhPhc4DVi-n4P_AvrTqtzM+zV1Ss_v_gQbHqw@mail.gmail.com>

I have a Fortran user that is getting a segv in MatSetValues_MPIAIJ in
v3.19 and v3.20 and it works with v3.17.
They are trying to get a line number but I was thinking it might be
worth trying the old dumb MatSetValues.
Is that possible?

Thanks,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231023/cfd62695/attachment.html>

From junchao.zhang at gmail.com  Mon Oct 23 09:09:07 2023
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Mon, 23 Oct 2023 09:09:07 -0500
Subject: [petsc-users] MatSetValue problem (in Fortran)
In-Reply-To: <CADOhEh6ZGLz7nvhPhc4DVi-n4P_AvrTqtzM+zV1Ss_v_gQbHqw@mail.gmail.com>
References: <CADOhEh6ZGLz7nvhPhc4DVi-n4P_AvrTqtzM+zV1Ss_v_gQbHqw@mail.gmail.com>
Message-ID: <CA+MQGp9f0RMhuN8HNEx1+CJajrZs2GXZR2Yt5TGcFroS0Ysu8w@mail.gmail.com>

Copy the code block of MatSetValues_MPIAIJ() in 3.17 to 3.20. If it works,
then it is possible :)

--Junchao Zhang


On Mon, Oct 23, 2023 at 7:16?AM Mark Adams <mfadams at lbl.gov> wrote:

> I have a Fortran user that is getting a segv in MatSetValues_MPIAIJ in
> v3.19 and v3.20 and it works with v3.17.
> They are trying to get a line number but I was thinking it might be
> worth trying the old dumb MatSetValues.
> Is that possible?
>
> Thanks,
> Mark
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231023/13fef71a/attachment.html>

From bsmith at petsc.dev  Mon Oct 23 09:18:33 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 23 Oct 2023 10:18:33 -0400
Subject: [petsc-users] MatSetValue problem (in Fortran)
In-Reply-To: <CADOhEh6ZGLz7nvhPhc4DVi-n4P_AvrTqtzM+zV1Ss_v_gQbHqw@mail.gmail.com>
References: <CADOhEh6ZGLz7nvhPhc4DVi-n4P_AvrTqtzM+zV1Ss_v_gQbHqw@mail.gmail.com>
Message-ID: <83DB8B24-922C-41F0-9C77-6C2E66D776D2@petsc.dev>


  Are they then not using any preallocation? The "old dumb MatSetValues" used default preallocation. To reproduce that they can call MatXAIJSetPreallocation() with 10 and 3 as the local and nonlocal number of nonzeros per row. 

  Best to use -start_in_debugger or -on_error_attach_debugger to find the details of the crash


> On Oct 23, 2023, at 8:15?AM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> I have a Fortran user that is getting a segv in MatSetValues_MPIAIJ in v3.19 and v3.20 and it works with v3.17. 
> They are trying to get a line number but I was thinking it might be worth trying the old dumb MatSetValues. 
> Is that possible?
> 
> Thanks,
> Mark


From mfadams at lbl.gov  Mon Oct 23 10:22:57 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Mon, 23 Oct 2023 11:22:57 -0400
Subject: [petsc-users] MatSetValue problem (in Fortran)
In-Reply-To: <83DB8B24-922C-41F0-9C77-6C2E66D776D2@petsc.dev>
References: <CADOhEh6ZGLz7nvhPhc4DVi-n4P_AvrTqtzM+zV1Ss_v_gQbHqw@mail.gmail.com>
	<83DB8B24-922C-41F0-9C77-6C2E66D776D2@petsc.dev>
Message-ID: <CADOhEh4jaGFUzWvg-zsUrjbxYMjUbG=jitwaVEd7tytjf7FOxA@mail.gmail.com>

On Mon, Oct 23, 2023 at 10:18?AM Barry Smith <bsmith at petsc.dev> wrote:

>
>   Are they then not using any preallocation?


I asked but have not gotten a response.


> The "old dumb MatSetValues" used default preallocation. To reproduce that
> they can call MatXAIJSetPreallocation() with 10 and 3 as the local and
> nonlocal number of nonzeros per row.
>
>
I am sure they use some sort of preallocation, or they would have seen bad
performance problems and they are pretty mature PETSc users, but we have to
wait and see.


>   Best to use -start_in_debugger or -on_error_attach_debugger to find the
> details of the crash
>
>
They did give me a ddt --offline .hdftm file, attached
(nice when you can't open a window, eg, Frontier).
I asked them to try to get a line number.

Thanks,
Mark


>
>
> > On Oct 23, 2023, at 8:15?AM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > I have a Fortran user that is getting a segv in MatSetValues_MPIAIJ in
> v3.19 and v3.20 and it works with v3.17.
> > They are trying to get a line number but I was thinking it might be
> worth trying the old dumb MatSetValues.
> > Is that possible?
> >
> > Thanks,
> > Mark
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231023/04769ebb/attachment-0002.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231023/04769ebb/attachment-0003.html>

From bsmith at petsc.dev  Mon Oct 23 11:45:06 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 23 Oct 2023 12:45:06 -0400
Subject: [petsc-users] MatSetValue problem (in Fortran)
In-Reply-To: <CADOhEh4jaGFUzWvg-zsUrjbxYMjUbG=jitwaVEd7tytjf7FOxA@mail.gmail.com>
References: <CADOhEh6ZGLz7nvhPhc4DVi-n4P_AvrTqtzM+zV1Ss_v_gQbHqw@mail.gmail.com>
	<83DB8B24-922C-41F0-9C77-6C2E66D776D2@petsc.dev>
	<CADOhEh4jaGFUzWvg-zsUrjbxYMjUbG=jitwaVEd7tytjf7FOxA@mail.gmail.com>
Message-ID: <FAA2703F-F021-4E25-9F5F-CE8D095790A7@petsc.dev>


  It's nice to see DOE is still buying computers for 100s of millions of dollars that do not support a debugger.


> On Oct 23, 2023, at 11:22?AM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> 
> 
> On Mon, Oct 23, 2023 at 10:18?AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>> 
>>   Are they then not using any preallocation?
> 
> I asked but have not gotten a response.
>  
>> The "old dumb MatSetValues" used default preallocation. To reproduce that they can call MatXAIJSetPreallocation() with 10 and 3 as the local and nonlocal number of nonzeros per row. 
>> 
> 
> I am sure they use some sort of preallocation, or they would have seen bad performance problems and they are pretty mature PETSc users, but we have to wait and see.
>  
>>   Best to use -start_in_debugger or -on_error_attach_debugger to find the details of the crash
>> 
> 
> They did give me a ddt --offline .hdftm file, attached 
> (nice when you can't open a window, eg, Frontier).
> I asked them to try to get a line number.
> 
> Thanks,
> Mark
>  
>> 
>> 
>> > On Oct 23, 2023, at 8:15?AM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
>> > 
>> > I have a Fortran user that is getting a segv in MatSetValues_MPIAIJ in v3.19 and v3.20 and it works with v3.17. 
>> > They are trying to get a line number but I was thinking it might be worth trying the old dumb MatSetValues. 
>> > Is that possible?
>> > 
>> > Thanks,
>> > Mark
>> 
> <debug (50).html>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231023/ac484ef1/attachment.html>

From rene.chenard at me.com  Mon Oct 23 18:16:06 2023
From: rene.chenard at me.com (Rene Chenard)
Date: Mon, 23 Oct 2023 19:16:06 -0400
Subject: [petsc-users] Seeking Clarification on SNES Solver Behavior
Message-ID: <2A793409-82F7-4A70-A2E4-882D39B6ABE0@me.com>

Hi!

We have recently noticed some inconsistencies in the behavior of the SNES solver when using different solver types, and we would greatly appreciate your insights in resolving this matter.

While working with SNESSolve in parallel, we have encountered a discrepancy in the behavior of the evaluation functions for the ComputeFunction and the JacobianFunction. Specifically, there seems to be an inconsistency in whether Vec x receives automatic updates to its ghosts or if manual updates are required (with calls to VecGhostUpdateBegin/End).

For instance, when using the ngmres solver, the ghosts of Vec x are adequately updated. However, when employing the nrichardson solver, it appears that manual updates to the ghosts are necessary.

It's important to note that we do not utilize the DM object in our implementation, as we have developed our own solution to manage models and discretization.

To better understand the root cause of this behavior, we kindly request your assistance in determining if we may be overlooking something in our implementation, or if there are inherent inconsistencies in the SNES solver itself.

Your expertise in this matter would be invaluable to us, and we thank you in advance for your consideration and support.

Warm regards,

?Ren? Chenard
Research Professional at Universit? Laval
rene.chenard.1 at ulaval.ca

From knepley at gmail.com  Tue Oct 24 07:23:17 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 24 Oct 2023 08:23:17 -0400
Subject: [petsc-users] Seeking Clarification on SNES Solver Behavior
In-Reply-To: <2A793409-82F7-4A70-A2E4-882D39B6ABE0@me.com>
References: <2A793409-82F7-4A70-A2E4-882D39B6ABE0@me.com>
Message-ID: <CAMYG4GkDy=uuqyMKMLMhfVRGg1FQotBWn3oouToJXQEaoyx7vw@mail.gmail.com>

On Mon, Oct 23, 2023 at 10:23?PM Rene Chenard via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi!
>
> We have recently noticed some inconsistencies in the behavior of the SNES
> solver when using different solver types, and we would greatly appreciate
> your insights in resolving this matter.
>
> While working with SNESSolve in parallel, we have encountered a
> discrepancy in the behavior of the evaluation functions for the
> ComputeFunction and the JacobianFunction. Specifically, there seems to be
> an inconsistency in whether Vec x receives automatic updates to its ghosts
> or if manual updates are required (with calls to VecGhostUpdateBegin/End).
>
> For instance, when using the ngmres solver, the ghosts of Vec x are
> adequately updated. However, when employing the nrichardson solver, it
> appears that manual updates to the ghosts are necessary.
>
> It's important to note that we do not utilize the DM object in our
> implementation, as we have developed our own solution to manage models and
> discretization.
>
> To better understand the root cause of this behavior, we kindly request
> your assistance in determining if we may be overlooking something in our
> implementation, or if there are inherent inconsistencies in the SNES solver
> itself.
>
> Your expertise in this matter would be invaluable to us, and we thank you
> in advance for your consideration and support.
>

Since you are not using DM, does that mean that you register a callback with

  https://petsc.org/main/manualpages/SNES/DMSNESSetFunction/

If so, we do not do any kind of local-to-global calls. I am not sure how
NGMRES would populate local
vectors for you. My guess is that you have a ghost update call somewhere in
your callback, and this
gets hit in the NGMRES because it has extra residual evaluations.

We can be more specific with more details about the code.

   Thanks,

     Matt


> Warm regards,
>
> ?Ren? Chenard
> Research Professional at Universit? Laval
> rene.chenard.1 at ulaval.ca
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231024/e7f7ff9a/attachment.html>

From damynchipman at u.boisestate.edu  Tue Oct 24 18:53:25 2023
From: damynchipman at u.boisestate.edu (Damyn Chipman)
Date: Tue, 24 Oct 2023 17:53:25 -0600
Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators
Message-ID: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu>

Hi PETSc developers,

In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another?

The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator.

My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc.

Thanks in advance,

Damyn Chipman
Boise State University
PhD Candidate
Computational Sciences and Engineering
damynchipman at u.boisestate.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231024/1510ef19/attachment.html>

From jed at jedbrown.org  Tue Oct 24 22:51:46 2023
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 24 Oct 2023 21:51:46 -0600
Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators
In-Reply-To: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu>
References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu>
Message-ID: <87h6mfmka5.fsf@jedbrown.org>

You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks.

That said, you usually have a function that assembles the matrix and you can just call that on the new communicator.

Damyn Chipman <damynchipman at u.boisestate.edu> writes:

> Hi PETSc developers,
>
> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another?
>
> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator.
>
> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc.
>
> Thanks in advance,
>
> Damyn Chipman
> Boise State University
> PhD Candidate
> Computational Sciences and Engineering
> damynchipman at u.boisestate.edu

From joauma.marichal at uclouvain.be  Wed Oct 25 07:31:43 2023
From: joauma.marichal at uclouvain.be (Joauma Marichal)
Date: Wed, 25 Oct 2023 12:31:43 +0000
Subject: [petsc-users] DMSwarm on multiple processors
Message-ID: <DU0PR03MB95901B99683E00FF1C38209B81DEA@DU0PR03MB9590.eurprd03.prod.outlook.com>

Hello,

I am using the DMSwarm library in some Eulerian-Lagrangian approach to have vapor bubbles in water.
I have obtained nice results recently and wanted to perform bigger simulations. Unfortunately, when I increase the number of processors used to run the simulation, I get the following error:


free(): invalid size

[cns136:590327] *** Process received signal ***

[cns136:590327] Signal: Aborted (6)

[cns136:590327] Signal code:  (-6)

[cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20]

[cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f]

[cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05]

[cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037]

[cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c]

[cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac]

[cns136:590327] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]

[cns136:590327] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642]

[cns136:590327] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]

[cns136:590327] [ 9] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]

[cns136:590327] [10] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]

[cns136:590327] [11] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]

[cns136:590327] [12] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20]

[cns136:590327] [13] ./cobpor[0x4418dc]

[cns136:590327] [14] ./cobpor[0x408b63]

[cns136:590327] [15] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]

[cns136:590327] [16] ./cobpor[0x40bdee]

[cns136:590327] *** End of error message ***

--------------------------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--------------------------------------------------------------------------

--------------------------------------------------------------------------

mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on signal 6 (Aborted).

--------------------------------------------------------------------------

When I reduce the number of processors the error disappears and when I run my code without the vapor bubbles it also works.
The problem seems to take place at this moment:

DMCreate(PETSC_COMM_WORLD,swarm);
    DMSetType(*swarm,DMSWARM);
    DMSetDimension(*swarm,3);
    DMSwarmSetType(*swarm,DMSWARM_PIC);
    DMSwarmSetCellDM(*swarm,*dmcell);


Thanks a lot for your help.

Best regards,

Joauma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231025/0bbd5049/attachment-0001.html>

From knepley at gmail.com  Wed Oct 25 07:45:38 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 25 Oct 2023 08:45:38 -0400
Subject: [petsc-users] [petsc-maint] DMSwarm on multiple processors
In-Reply-To: <DU0PR03MB95901B99683E00FF1C38209B81DEA@DU0PR03MB9590.eurprd03.prod.outlook.com>
References: <DU0PR03MB95901B99683E00FF1C38209B81DEA@DU0PR03MB9590.eurprd03.prod.outlook.com>
Message-ID: <CAMYG4GkBOyWv=pM-Y3rvr-pfjigTp7WJ-ngj77r=PGZLqeQk4Q@mail.gmail.com>

On Wed, Oct 25, 2023 at 8:32?AM Joauma Marichal via petsc-maint <
petsc-maint at mcs.anl.gov> wrote:

> Hello,
>
>
>
> I am using the DMSwarm library in some Eulerian-Lagrangian approach to
> have vapor bubbles in water.
>
> I have obtained nice results recently and wanted to perform bigger
> simulations. Unfortunately, when I increase the number of processors used
> to run the simulation, I get the following error:
>
>
>
> free(): invalid size
>
> [cns136:590327] *** Process received signal ***
>
> [cns136:590327] Signal: Aborted (6)
>
> [cns136:590327] Signal code:  (-6)
>
> [cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20]
>
> [cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f]
>
> [cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05]
>
> [cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037]
>
> [cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c]
>
> [cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac]
>
> [cns136:590327] [ 6]
> /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]
>
> [cns136:590327] [ 7]
> /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642]
>
> [cns136:590327] [ 8]
> /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]
>
> [cns136:590327] [ 9]
> /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]
>
> [cns136:590327] [10]
> /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]
>
> [cns136:590327] [11]
> /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]
>
> [cns136:590327] [12]
> /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20]
>
> [cns136:590327] [13] ./cobpor[0x4418dc]
>
> [cns136:590327] [14] ./cobpor[0x408b63]
>
> [cns136:590327] [15]
> /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]
>
> [cns136:590327] [16] ./cobpor[0x40bdee]
>
> [cns136:590327] *** End of error message ***
>
> --------------------------------------------------------------------------
>
> Primary job  terminated normally, but 1 process returned
>
> a non-zero exit code. Per user-direction, the job has been aborted.
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>
> mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited
> on signal 6 (Aborted).
>
> --------------------------------------------------------------------------
>
>
>
> When I reduce the number of processors the error disappears and when I run
> my code without the vapor bubbles it also works.
>
> The problem seems to take place at this moment:
>
>
>
> DMCreate(PETSC_COMM_WORLD,swarm);
>
>     DMSetType(*swarm,DMSWARM);
>
>     DMSetDimension(*swarm,3);
>
>     DMSwarmSetType(*swarm,DMSWARM_PIC);
>
>     DMSwarmSetCellDM(*swarm,*dmcell);
>
>
>
>
>
> Thanks a lot for your help.
>

Things that would help us track this down:

1) The smallest example where it fails

2) The smallest number of processes where it fails

3) A stack trace of the failure

4) A simple example that we can run that also fails

  Thanks,

     Matt


> Best regards,
>
>
>
> Joauma
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231025/e53e7f96/attachment.html>

From qiyuelu1 at gmail.com  Wed Oct 25 10:09:08 2023
From: qiyuelu1 at gmail.com (Qiyue Lu)
Date: Wed, 25 Oct 2023 10:09:08 -0500
Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building rules
Message-ID: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>

Hello,
I have an in-house code enabled OpenMP and it works. Now I am trying to
incorporate PETSc as the linear solver and build together using the
building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the
OpenMP part doesn't work anymore.
Should I re-configure the petsc installation with --with-openmp=1 option? I
wonder are the building rules affected by this missing option?

Thanks,
Qiyue Lu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231025/5afb436b/attachment.html>

From bsmith at petsc.dev  Wed Oct 25 10:14:38 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 25 Oct 2023 11:14:38 -0400
Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building
 rules
In-Reply-To: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
References: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
Message-ID: <3A18A24D-5445-4F2F-A932-75B2D1C6B6CD@petsc.dev>


  To have OpenMP available from the PETSc make system you need to have --with-openmp with the .PETSc /configure options


> On Oct 25, 2023, at 11:09?AM, Qiyue Lu <qiyuelu1 at gmail.com> wrote:
> 
> Hello,
> I have an in-house code enabled OpenMP and it works. Now I am trying to incorporate PETSc as the linear solver and build together using the building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the OpenMP part doesn't work anymore. 
> Should I re-configure the petsc installation with --with-openmp=1 option? I wonder are the building rules affected by this missing option?
> 
> Thanks,
> Qiyue Lu


From knepley at gmail.com  Wed Oct 25 10:48:07 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 25 Oct 2023 11:48:07 -0400
Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building
 rules
In-Reply-To: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
References: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
Message-ID: <CAMYG4GkkfGQWx+JQmk9aVS9yRzYbJnWgc=rMhoNhtEX0ZzVgZA@mail.gmail.com>

On Wed, Oct 25, 2023 at 11:12?AM Qiyue Lu <qiyuelu1 at gmail.com> wrote:

> Hello,
> I have an in-house code enabled OpenMP and it works. Now I am trying to
> incorporate PETSc as the linear solver and build together using the
> building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the
> OpenMP part doesn't work anymore.
> Should I re-configure the petsc installation with --with-openmp=1 option?
> I wonder are the building rules affected by this missing option?
>

There are parts of PETSc that are not threadsafe unless you configure using
--with-threadsafety. If you plan to call PETSc methods from different
threads, you need this.

  Thanks,

     Matt


> Thanks,
> Qiyue Lu
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231025/dcf945b0/attachment.html>

From balay at mcs.anl.gov  Wed Oct 25 10:54:24 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 25 Oct 2023 10:54:24 -0500 (CDT)
Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building
 rules
In-Reply-To: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
References: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
Message-ID: <f3eb3433-b5f1-ff18-0079-4d6485985e29@mcs.anl.gov>


On Wed, 25 Oct 2023, Qiyue Lu wrote:

> Hello,
> I have an in-house code enabled OpenMP and it works. Now I am trying to
> incorporate PETSc as the linear solver and build together using the
> building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the
> OpenMP part doesn't work anymore.

If you are looking at building only your sources with openmp - using petsc formatted makefile [using petsc build rules],
you can specify it via CFLAGS - either in makefile - or on command line.

>>>>>>>
For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the corresponding make fules]

[balay at pj01 tutorials]$ make ex2
mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0  -I/home/balay/petsc/include -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib -L/home/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex2
[balay at pj01 tutorials]$ make clean
[balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp
mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib -L/home/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex2
[balay at pj01 tutorials]$ 
<<<<<

Satish


> Should I re-configure the petsc installation with --with-openmp=1 option? I
> wonder are the building rules affected by this missing option?
> 
> Thanks,
> Qiyue Lu
> 


From qiyuelu1 at gmail.com  Wed Oct 25 11:06:14 2023
From: qiyuelu1 at gmail.com (Qiyue Lu)
Date: Wed, 25 Oct 2023 11:06:14 -0500
Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building
 rules
In-Reply-To: <f3eb3433-b5f1-ff18-0079-4d6485985e29@mcs.anl.gov>
References: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
	<f3eb3433-b5f1-ff18-0079-4d6485985e29@mcs.anl.gov>
Message-ID: <CALm6fhkbHqPYJWkCEVcQkJcFUbApx90OxW5dnmLgDrkRKfz=pg@mail.gmail.com>

Thanks for your reply, using this configurations:

*--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90
--download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1 --with-openmp=1
--with-threadsafety=1*
However, I got an error like:
*nvcc fatal   : Unknown option '-fopenmp'*
Previously, when I don't have --with-openmp for the configuration, the
PETSc make system can build my *.cu code using nvcc and g++, of course,
OpenMP doesn't work. Now with this --with-openmp option, it cannot even
build. The interesting thing is, I got this error even after removing the
*-fopenmp* from *CXXFLAGS* contents:
CXXFLAGS=-std=c++17
LDFLAGS=
CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common
include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules


Thanks,
Qiyue Lu

On Wed, Oct 25, 2023 at 10:54?AM Satish Balay <balay at mcs.anl.gov> wrote:

>
> On Wed, 25 Oct 2023, Qiyue Lu wrote:
>
> > Hello,
> > I have an in-house code enabled OpenMP and it works. Now I am trying to
> > incorporate PETSc as the linear solver and build together using the
> > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the
> > OpenMP part doesn't work anymore.
>
> If you are looking at building only your sources with openmp - using petsc
> formatted makefile [using petsc build rules],
> you can specify it via CFLAGS - either in makefile - or on command line.
>
> >>>>>>>
> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the
> corresponding make fules]
>
> [balay at pj01 tutorials]$ make ex2
> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> -fvisibility=hidden -g3 -O0  -I/home/balay/petsc/include
> -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic
> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
> -L/home/balay/petsc/arch-linux-c-debug/lib
> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11
> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
> -lquadmath -o ex2
> [balay at pj01 tutorials]$ make clean
> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp
> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include
> -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic
> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
> -L/home/balay/petsc/arch-linux-c-debug/lib
> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11
> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
> -lquadmath -o ex2
> [balay at pj01 tutorials]$
> <<<<<
>
> Satish
>
>
> > Should I re-configure the petsc installation with --with-openmp=1
> option? I
> > wonder are the building rules affected by this missing option?
> >
> > Thanks,
> > Qiyue Lu
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231025/c8ea6c43/attachment-0001.html>

From qiyuelu1 at gmail.com  Wed Oct 25 11:35:35 2023
From: qiyuelu1 at gmail.com (Qiyue Lu)
Date: Wed, 25 Oct 2023 11:35:35 -0500
Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building
 rules
In-Reply-To: <CALm6fhkbHqPYJWkCEVcQkJcFUbApx90OxW5dnmLgDrkRKfz=pg@mail.gmail.com>
References: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
	<f3eb3433-b5f1-ff18-0079-4d6485985e29@mcs.anl.gov>
	<CALm6fhkbHqPYJWkCEVcQkJcFUbApx90OxW5dnmLgDrkRKfz=pg@mail.gmail.com>
Message-ID: <CALm6fhmfztksD4i+om+CW5U+Z5rrJvYqJ=3qBqAHO_YKh_Nhqg@mail.gmail.com>

Even with
CXXFLAGS=-Xcompiler -fopenmp -std=c++17
LDFLAGS= -Xcompiler -fopenmp
CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common
include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules

won't work.

On Wed, Oct 25, 2023 at 11:06?AM Qiyue Lu <qiyuelu1 at gmail.com> wrote:

> Thanks for your reply, using this configurations:
>
> *--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90
> --download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1 --with-openmp=1
> --with-threadsafety=1*
> However, I got an error like:
> *nvcc fatal   : Unknown option '-fopenmp'*
> Previously, when I don't have --with-openmp for the configuration, the
> PETSc make system can build my *.cu code using nvcc and g++, of course,
> OpenMP doesn't work. Now with this --with-openmp option, it cannot even
> build. The interesting thing is, I got this error even after removing the
> *-fopenmp* from *CXXFLAGS* contents:
> CXXFLAGS=-std=c++17
> LDFLAGS=
> CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common
> include ${PETSC_DIR}/lib/petsc/conf/variables
> include ${PETSC_DIR}/lib/petsc/conf/rules
>
>
>
> Thanks,
> Qiyue Lu
>
> On Wed, Oct 25, 2023 at 10:54?AM Satish Balay <balay at mcs.anl.gov> wrote:
>
>>
>> On Wed, 25 Oct 2023, Qiyue Lu wrote:
>>
>> > Hello,
>> > I have an in-house code enabled OpenMP and it works. Now I am trying to
>> > incorporate PETSc as the linear solver and build together using the
>> > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the
>> > OpenMP part doesn't work anymore.
>>
>> If you are looking at building only your sources with openmp - using
>> petsc formatted makefile [using petsc build rules],
>> you can specify it via CFLAGS - either in makefile - or on command line.
>>
>> >>>>>>>
>> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the
>> corresponding make fules]
>>
>> [balay at pj01 tutorials]$ make ex2
>> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
>> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
>> -fvisibility=hidden -g3 -O0  -I/home/balay/petsc/include
>> -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic
>> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
>> -L/home/balay/petsc/arch-linux-c-debug/lib
>> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
>> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11
>> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
>> -lquadmath -o ex2
>> [balay at pj01 tutorials]$ make clean
>> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp
>> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
>> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
>> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include
>> -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic
>> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
>> -L/home/balay/petsc/arch-linux-c-debug/lib
>> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
>> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11
>> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
>> -lquadmath -o ex2
>> [balay at pj01 tutorials]$
>> <<<<<
>>
>> Satish
>>
>>
>> > Should I re-configure the petsc installation with --with-openmp=1
>> option? I
>> > wonder are the building rules affected by this missing option?
>> >
>> > Thanks,
>> > Qiyue Lu
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231025/04c7ac3e/attachment.html>

From balay at mcs.anl.gov  Wed Oct 25 11:44:06 2023
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 25 Oct 2023 11:44:06 -0500 (CDT)
Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building
 rules
In-Reply-To: <CALm6fhmfztksD4i+om+CW5U+Z5rrJvYqJ=3qBqAHO_YKh_Nhqg@mail.gmail.com>
References: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
	<f3eb3433-b5f1-ff18-0079-4d6485985e29@mcs.anl.gov>
	<CALm6fhkbHqPYJWkCEVcQkJcFUbApx90OxW5dnmLgDrkRKfz=pg@mail.gmail.com>
	<CALm6fhmfztksD4i+om+CW5U+Z5rrJvYqJ=3qBqAHO_YKh_Nhqg@mail.gmail.com>
Message-ID: <a6da10ea-3c10-6360-5010-c01e55e11292@mcs.anl.gov>

I guess the flag you are looking for is CUDAFLAGS

>>>
balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ make ex100 CUDAFLAGS="-Xcompiler -fopenmp" LDFLAGS=-fopenmp
/usr/local/cuda/bin/nvcc -o ex100.o -c -I/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/include  -ccbin mpicxx -std=c++17 -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo -gencode arch=compute_86,code=sm_86  -Xcompiler -fopenmp    -I/scratch/balay/petsc/include -I/scratch/balay/petsc/arch-linux-c-debug/include -I/usr/local/cuda/include    `pwd`/ex100.cu
mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0  -fopenmp -Wl,-export-dynamic ex100.o  -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib -L/scratch/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -Wl,-rpath,/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib -L/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex100
rm ex100.o
balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ 
<<<

Satish

On Wed, 25 Oct 2023, Qiyue Lu wrote:

> Even with
> CXXFLAGS=-Xcompiler -fopenmp -std=c++17
> LDFLAGS= -Xcompiler -fopenmp
> CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common
> include ${PETSC_DIR}/lib/petsc/conf/variables
> include ${PETSC_DIR}/lib/petsc/conf/rules
> 
> won't work.
> 
> On Wed, Oct 25, 2023 at 11:06?AM Qiyue Lu <qiyuelu1 at gmail.com> wrote:
> 
> > Thanks for your reply, using this configurations:
> >
> > *--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90
> > --download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1 --with-openmp=1
> > --with-threadsafety=1*
> > However, I got an error like:
> > *nvcc fatal   : Unknown option '-fopenmp'*
> > Previously, when I don't have --with-openmp for the configuration, the
> > PETSc make system can build my *.cu code using nvcc and g++, of course,
> > OpenMP doesn't work. Now with this --with-openmp option, it cannot even
> > build. The interesting thing is, I got this error even after removing the
> > *-fopenmp* from *CXXFLAGS* contents:
> > CXXFLAGS=-std=c++17
> > LDFLAGS=
> > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common
> > include ${PETSC_DIR}/lib/petsc/conf/variables
> > include ${PETSC_DIR}/lib/petsc/conf/rules
> >
> >
> >
> > Thanks,
> > Qiyue Lu
> >
> > On Wed, Oct 25, 2023 at 10:54?AM Satish Balay <balay at mcs.anl.gov> wrote:
> >
> >>
> >> On Wed, 25 Oct 2023, Qiyue Lu wrote:
> >>
> >> > Hello,
> >> > I have an in-house code enabled OpenMP and it works. Now I am trying to
> >> > incorporate PETSc as the linear solver and build together using the
> >> > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the
> >> > OpenMP part doesn't work anymore.
> >>
> >> If you are looking at building only your sources with openmp - using
> >> petsc formatted makefile [using petsc build rules],
> >> you can specify it via CFLAGS - either in makefile - or on command line.
> >>
> >> >>>>>>>
> >> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the
> >> corresponding make fules]
> >>
> >> [balay at pj01 tutorials]$ make ex2
> >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> >> -fvisibility=hidden -g3 -O0  -I/home/balay/petsc/include
> >> -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic
> >> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
> >> -L/home/balay/petsc/arch-linux-c-debug/lib
> >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
> >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
> >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11
> >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
> >> -lquadmath -o ex2
> >> [balay at pj01 tutorials]$ make clean
> >> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp
> >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> >> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include
> >> -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic
> >> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
> >> -L/home/balay/petsc/arch-linux-c-debug/lib
> >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
> >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
> >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11
> >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
> >> -lquadmath -o ex2
> >> [balay at pj01 tutorials]$
> >> <<<<<
> >>
> >> Satish
> >>
> >>
> >> > Should I re-configure the petsc installation with --with-openmp=1
> >> option? I
> >> > wonder are the building rules affected by this missing option?
> >> >
> >> > Thanks,
> >> > Qiyue Lu
> >> >
> >>
> >>
> 

From qiyuelu1 at gmail.com  Wed Oct 25 11:47:53 2023
From: qiyuelu1 at gmail.com (Qiyue Lu)
Date: Wed, 25 Oct 2023 11:47:53 -0500
Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building
 rules
In-Reply-To: <a6da10ea-3c10-6360-5010-c01e55e11292@mcs.anl.gov>
References: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
	<f3eb3433-b5f1-ff18-0079-4d6485985e29@mcs.anl.gov>
	<CALm6fhkbHqPYJWkCEVcQkJcFUbApx90OxW5dnmLgDrkRKfz=pg@mail.gmail.com>
	<CALm6fhmfztksD4i+om+CW5U+Z5rrJvYqJ=3qBqAHO_YKh_Nhqg@mail.gmail.com>
	<a6da10ea-3c10-6360-5010-c01e55e11292@mcs.anl.gov>
Message-ID: <CALm6fh=9_xQLEnxyRUjppPpqRnG6GTE_1tzFqg7BVRpm0kEUiw@mail.gmail.com>

Thanks, however, CUDAFLAGS doesn't work.
Even I removed all -fopenmp string from the make file, it still complains
nvcc doesn't know -fopenmp. It seems by enabling --with-openmp, there is a
background -fopenmp option added without -Xcompiler to any compiler.

On Wed, Oct 25, 2023 at 11:44?AM Satish Balay <balay at mcs.anl.gov> wrote:

> I guess the flag you are looking for is CUDAFLAGS
>
> >>>
> balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ make ex100
> CUDAFLAGS="-Xcompiler -fopenmp" LDFLAGS=-fopenmp
> /usr/local/cuda/bin/nvcc -o ex100.o -c
> -I/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/include  -ccbin mpicxx
> -std=c++17 -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo
> -gencode arch=compute_86,code=sm_86  -Xcompiler -fopenmp
> -I/scratch/balay/petsc/include
> -I/scratch/balay/petsc/arch-linux-c-debug/include
> -I/usr/local/cuda/include    `pwd`/ex100.cu
> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> -fvisibility=hidden -g3 -O0  -fopenmp -Wl,-export-dynamic ex100.o
> -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib
> -L/scratch/balay/petsc/arch-linux-c-debug/lib
> -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64
> -L/usr/local/cuda/lib64/stubs
> -Wl,-rpath,/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib
> -L/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11
> -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -llapack -lblas -lm -lcudart
> -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11
> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
> -lquadmath -o ex100
> rm ex100.o
> balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$
> <<<
>
> Satish
>
> On Wed, 25 Oct 2023, Qiyue Lu wrote:
>
> > Even with
> > CXXFLAGS=-Xcompiler -fopenmp -std=c++17
> > LDFLAGS= -Xcompiler -fopenmp
> > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common
> > include ${PETSC_DIR}/lib/petsc/conf/variables
> > include ${PETSC_DIR}/lib/petsc/conf/rules
> >
> > won't work.
> >
> > On Wed, Oct 25, 2023 at 11:06?AM Qiyue Lu <qiyuelu1 at gmail.com> wrote:
> >
> > > Thanks for your reply, using this configurations:
> > >
> > > *--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90
> > > --download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1
> --with-openmp=1
> > > --with-threadsafety=1*
> > > However, I got an error like:
> > > *nvcc fatal   : Unknown option '-fopenmp'*
> > > Previously, when I don't have --with-openmp for the configuration, the
> > > PETSc make system can build my *.cu code using nvcc and g++, of course,
> > > OpenMP doesn't work. Now with this --with-openmp option, it cannot even
> > > build. The interesting thing is, I got this error even after removing
> the
> > > *-fopenmp* from *CXXFLAGS* contents:
> > > CXXFLAGS=-std=c++17
> > > LDFLAGS=
> > > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common
> > > include ${PETSC_DIR}/lib/petsc/conf/variables
> > > include ${PETSC_DIR}/lib/petsc/conf/rules
> > >
> > >
> > >
> > > Thanks,
> > > Qiyue Lu
> > >
> > > On Wed, Oct 25, 2023 at 10:54?AM Satish Balay <balay at mcs.anl.gov>
> wrote:
> > >
> > >>
> > >> On Wed, 25 Oct 2023, Qiyue Lu wrote:
> > >>
> > >> > Hello,
> > >> > I have an in-house code enabled OpenMP and it works. Now I am
> trying to
> > >> > incorporate PETSc as the linear solver and build together using the
> > >> > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I
> found the
> > >> > OpenMP part doesn't work anymore.
> > >>
> > >> If you are looking at building only your sources with openmp - using
> > >> petsc formatted makefile [using petsc build rules],
> > >> you can specify it via CFLAGS - either in makefile - or on command
> line.
> > >>
> > >> >>>>>>>
> > >> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with
> the
> > >> corresponding make fules]
> > >>
> > >> [balay at pj01 tutorials]$ make ex2
> > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> > >> -fvisibility=hidden -g3 -O0  -I/home/balay/petsc/include
> > >> -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic
> > >> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
> > >> -L/home/balay/petsc/arch-linux-c-debug/lib
> > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
> > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
> > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm
> -lX11
> > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath
> -lstdc++
> > >> -lquadmath -o ex2
> > >> [balay at pj01 tutorials]$ make clean
> > >> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp
> > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> > >> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include
> > >> -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic
> > >> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
> > >> -L/home/balay/petsc/arch-linux-c-debug/lib
> > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
> > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
> > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm
> -lX11
> > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath
> -lstdc++
> > >> -lquadmath -o ex2
> > >> [balay at pj01 tutorials]$
> > >> <<<<<
> > >>
> > >> Satish
> > >>
> > >>
> > >> > Should I re-configure the petsc installation with --with-openmp=1
> > >> option? I
> > >> > wonder are the building rules affected by this missing option?
> > >> >
> > >> > Thanks,
> > >> > Qiyue Lu
> > >> >
> > >>
> > >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231025/b37c0d98/attachment-0001.html>

From bsmith at petsc.dev  Wed Oct 25 11:57:45 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 25 Oct 2023 12:57:45 -0400
Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building
 rules
In-Reply-To: <CALm6fh=9_xQLEnxyRUjppPpqRnG6GTE_1tzFqg7BVRpm0kEUiw@mail.gmail.com>
References: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
	<f3eb3433-b5f1-ff18-0079-4d6485985e29@mcs.anl.gov>
	<CALm6fhkbHqPYJWkCEVcQkJcFUbApx90OxW5dnmLgDrkRKfz=pg@mail.gmail.com>
	<CALm6fhmfztksD4i+om+CW5U+Z5rrJvYqJ=3qBqAHO_YKh_Nhqg@mail.gmail.com>
	<a6da10ea-3c10-6360-5010-c01e55e11292@mcs.anl.gov>
	<CALm6fh=9_xQLEnxyRUjppPpqRnG6GTE_1tzFqg7BVRpm0kEUiw@mail.gmail.com>
Message-ID: <FB59593A-B4BB-4C49-9DB9-B29A7D5A7B52@petsc.dev>


   I apologize, my suggestion of adding the --with-openmp fails in your environment. You should follow Satish's recommendation of adding the flags in your makefile exactly as needed.

If you are looking at building only your sources with openmp - using petsc formatted makefile [using petsc build rules],
you can specify it via CFLAGS - either in makefile - or on command line.

>>>>>>> 
For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the corresponding make fules]

[balay at pj01 tutorials]$ make ex2
mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0  -I/home/balay/petsc/include -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib -L/home/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex2
[balay at pj01 tutorials]$ make clean
[balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp
mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib -L/home/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13 -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex2
[balay at pj01 tutorials]$


> On Oct 25, 2023, at 12:47?PM, Qiyue Lu <qiyuelu1 at gmail.com> wrote:
> 
> Thanks, however, CUDAFLAGS doesn't work. 
> Even I removed all -fopenmp string from the make file, it still complains nvcc doesn't know -fopenmp. It seems by enabling --with-openmp, there is a background -fopenmp option added without -Xcompiler to any compiler. 
> 
> On Wed, Oct 25, 2023 at 11:44?AM Satish Balay <balay at mcs.anl.gov <mailto:balay at mcs.anl.gov>> wrote:
>> I guess the flag you are looking for is CUDAFLAGS
>> 
>> >>>
>> balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ make ex100 CUDAFLAGS="-Xcompiler -fopenmp" LDFLAGS=-fopenmp
>> /usr/local/cuda/bin/nvcc -o ex100.o -c -I/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/include  -ccbin mpicxx -std=c++17 -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo -gencode arch=compute_86,code=sm_86  -Xcompiler -fopenmp    -I/scratch/balay/petsc/include -I/scratch/balay/petsc/arch-linux-c-debug/include -I/usr/local/cuda/include    `pwd`/ex100.cu <http://ex100.cu/>
>> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0  -fopenmp -Wl,-export-dynamic ex100.o  -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib -L/scratch/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -Wl,-rpath,/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib -L/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lquadmath -o ex100
>> rm ex100.o
>> balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ 
>> <<<
>> 
>> Satish
>> 
>> On Wed, 25 Oct 2023, Qiyue Lu wrote:
>> 
>> > Even with
>> > CXXFLAGS=-Xcompiler -fopenmp -std=c++17
>> > LDFLAGS= -Xcompiler -fopenmp
>> > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common
>> > include ${PETSC_DIR}/lib/petsc/conf/variables
>> > include ${PETSC_DIR}/lib/petsc/conf/rules
>> > 
>> > won't work.
>> > 
>> > On Wed, Oct 25, 2023 at 11:06?AM Qiyue Lu <qiyuelu1 at gmail.com <mailto:qiyuelu1 at gmail.com>> wrote:
>> > 
>> > > Thanks for your reply, using this configurations:
>> > >
>> > > *--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90
>> > > --download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1 --with-openmp=1
>> > > --with-threadsafety=1*
>> > > However, I got an error like:
>> > > *nvcc fatal   : Unknown option '-fopenmp'*
>> > > Previously, when I don't have --with-openmp for the configuration, the
>> > > PETSc make system can build my *.cu code using nvcc and g++, of course,
>> > > OpenMP doesn't work. Now with this --with-openmp option, it cannot even
>> > > build. The interesting thing is, I got this error even after removing the
>> > > *-fopenmp* from *CXXFLAGS* contents:
>> > > CXXFLAGS=-std=c++17
>> > > LDFLAGS=
>> > > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common
>> > > include ${PETSC_DIR}/lib/petsc/conf/variables
>> > > include ${PETSC_DIR}/lib/petsc/conf/rules
>> > >
>> > >
>> > >
>> > > Thanks,
>> > > Qiyue Lu
>> > >
>> > > On Wed, Oct 25, 2023 at 10:54?AM Satish Balay <balay at mcs.anl.gov <mailto:balay at mcs.anl.gov>> wrote:
>> > >
>> > >>
>> > >> On Wed, 25 Oct 2023, Qiyue Lu wrote:
>> > >>
>> > >> > Hello,
>> > >> > I have an in-house code enabled OpenMP and it works. Now I am trying to
>> > >> > incorporate PETSc as the linear solver and build together using the
>> > >> > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I found the
>> > >> > OpenMP part doesn't work anymore.
>> > >>
>> > >> If you are looking at building only your sources with openmp - using
>> > >> petsc formatted makefile [using petsc build rules],
>> > >> you can specify it via CFLAGS - either in makefile - or on command line.
>> > >>
>> > >> >>>>>>>
>> > >> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the
>> > >> corresponding make fules]
>> > >>
>> > >> [balay at pj01 tutorials]$ make ex2
>> > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
>> > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
>> > >> -fvisibility=hidden -g3 -O0  -I/home/balay/petsc/include
>> > >> -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic
>> > >> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
>> > >> -L/home/balay/petsc/arch-linux-c-debug/lib
>> > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
>> > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
>> > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11
>> > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
>> > >> -lquadmath -o ex2
>> > >> [balay at pj01 tutorials]$ make clean
>> > >> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp
>> > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
>> > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
>> > >> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include
>> > >> -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic
>> > >> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
>> > >> -L/home/balay/petsc/arch-linux-c-debug/lib
>> > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
>> > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
>> > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11
>> > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
>> > >> -lquadmath -o ex2
>> > >> [balay at pj01 tutorials]$
>> > >> <<<<<
>> > >>
>> > >> Satish
>> > >>
>> > >>
>> > >> > Should I re-configure the petsc installation with --with-openmp=1
>> > >> option? I
>> > >> > wonder are the building rules affected by this missing option?
>> > >> >
>> > >> > Thanks,
>> > >> > Qiyue Lu
>> > >> >
>> > >>
>> > >>
>> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231025/a5be86f7/attachment.html>

From qiyuelu1 at gmail.com  Wed Oct 25 12:02:36 2023
From: qiyuelu1 at gmail.com (Qiyue Lu)
Date: Wed, 25 Oct 2023 12:02:36 -0500
Subject: [petsc-users] OpenMP doesn't work anymore with PETSc building
 rules
In-Reply-To: <FB59593A-B4BB-4C49-9DB9-B29A7D5A7B52@petsc.dev>
References: <CALm6fhkmZg8L8TU_80ZSHxbYLSHk1-VSJ9OLW59u-NqFzp97Qw@mail.gmail.com>
	<f3eb3433-b5f1-ff18-0079-4d6485985e29@mcs.anl.gov>
	<CALm6fhkbHqPYJWkCEVcQkJcFUbApx90OxW5dnmLgDrkRKfz=pg@mail.gmail.com>
	<CALm6fhmfztksD4i+om+CW5U+Z5rrJvYqJ=3qBqAHO_YKh_Nhqg@mail.gmail.com>
	<a6da10ea-3c10-6360-5010-c01e55e11292@mcs.anl.gov>
	<CALm6fh=9_xQLEnxyRUjppPpqRnG6GTE_1tzFqg7BVRpm0kEUiw@mail.gmail.com>
	<FB59593A-B4BB-4C49-9DB9-B29A7D5A7B52@petsc.dev>
Message-ID: <CALm6fhkfctH_aFFSkBJSb0n1E0LD-T_vhxk6XE72692xRJk0WQ@mail.gmail.com>

NO, NO, NO, Any try or suggestions are meaningful. I appreciate help from
all of you. Have a nice day.

Qiyue Lu

On Wed, Oct 25, 2023 at 11:57?AM Barry Smith <bsmith at petsc.dev> wrote:

>
>    I apologize, my suggestion of adding the --with-openmp fails in your
> environment. You should follow Satish's recommendation of adding the flags
> in your makefile exactly as needed.
>
> If you are looking at building only your sources with openmp - using petsc
> formatted makefile [using petsc build rules],
> you can specify it via CFLAGS - either in makefile - or on command line.
>
>
> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with the
> corresponding make fules]
>
> [balay at pj01 tutorials]$ make ex2
> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> -fvisibility=hidden -g3 -O0  -I/home/balay/petsc/include
> -I/home/balay/petsc/arch-linux-c-debug/include     -Wl,-export-dynamic
> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
> -L/home/balay/petsc/arch-linux-c-debug/lib
> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11
> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
> -lquadmath -o ex2
> [balay at pj01 tutorials]$ make clean
> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp
> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include
> -I/home/balay/petsc/arch-linux-c-debug/include -Wl,-export-dynamic ex2.c
>  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
> -L/home/balay/petsc/arch-linux-c-debug/lib
> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm -lX11
> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
> -lquadmath -o ex2
> [balay at pj01 tutorials]$
>
>
> On Oct 25, 2023, at 12:47?PM, Qiyue Lu <qiyuelu1 at gmail.com> wrote:
>
> Thanks, however, CUDAFLAGS doesn't work.
> Even I removed all -fopenmp string from the make file, it still complains
> nvcc doesn't know -fopenmp. It seems by enabling --with-openmp, there is a
> background -fopenmp option added without -Xcompiler to any compiler.
>
> On Wed, Oct 25, 2023 at 11:44?AM Satish Balay <balay at mcs.anl.gov> wrote:
>
>> I guess the flag you are looking for is CUDAFLAGS
>>
>> >>>
>> balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$ make ex100
>> CUDAFLAGS="-Xcompiler -fopenmp" LDFLAGS=-fopenmp
>> /usr/local/cuda/bin/nvcc -o ex100.o -c
>> -I/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/include  -ccbin mpicxx
>> -std=c++17 -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo
>> -gencode arch=compute_86,code=sm_86  -Xcompiler -fopenmp
>> -I/scratch/balay/petsc/include
>> -I/scratch/balay/petsc/arch-linux-c-debug/include
>> -I/usr/local/cuda/include    `pwd`/ex100.cu
>> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
>> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
>> -fvisibility=hidden -g3 -O0  -fopenmp -Wl,-export-dynamic ex100.o
>> -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib
>> -L/scratch/balay/petsc/arch-linux-c-debug/lib
>> -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64
>> -L/usr/local/cuda/lib64/stubs
>> -Wl,-rpath,/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib
>> -L/nfs/gce/projects/petsc/soft/u22.04/mpich-4.0.2/lib
>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11
>> -L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -llapack -lblas -lm -lcudart
>> -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11
>> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++
>> -lquadmath -o ex100
>> rm ex100.o
>> balay at petsc-gpu-01:/scratch/balay/petsc/src/vec/vec/tests$
>> <<<
>>
>> Satish
>>
>> On Wed, 25 Oct 2023, Qiyue Lu wrote:
>>
>> > Even with
>> > CXXFLAGS=-Xcompiler -fopenmp -std=c++17
>> > LDFLAGS= -Xcompiler -fopenmp
>> > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common
>> > include ${PETSC_DIR}/lib/petsc/conf/variables
>> > include ${PETSC_DIR}/lib/petsc/conf/rules
>> >
>> > won't work.
>> >
>> > On Wed, Oct 25, 2023 at 11:06?AM Qiyue Lu <qiyuelu1 at gmail.com> wrote:
>> >
>> > > Thanks for your reply, using this configurations:
>> > >
>> > > *--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90
>> > > --download-f2cblaslapack=1 --with-cudac=nvcc --with-cuda=1
>> --with-openmp=1
>> > > --with-threadsafety=1*
>> > > However, I got an error like:
>> > > *nvcc fatal   : Unknown option '-fopenmp'*
>> > > Previously, when I don't have --with-openmp for the configuration, the
>> > > PETSc make system can build my *.cu code using nvcc and g++, of
>> course,
>> > > OpenMP doesn't work. Now with this --with-openmp option, it cannot
>> even
>> > > build. The interesting thing is, I got this error even after removing
>> the
>> > > *-fopenmp* from *CXXFLAGS* contents:
>> > > CXXFLAGS=-std=c++17
>> > > LDFLAGS=
>> > > CXXPPFLAGS=-I/u/qiyuelu1/cuda/cuda-samples/Common
>> > > include ${PETSC_DIR}/lib/petsc/conf/variables
>> > > include ${PETSC_DIR}/lib/petsc/conf/rules
>> > >
>> > >
>> > >
>> > > Thanks,
>> > > Qiyue Lu
>> > >
>> > > On Wed, Oct 25, 2023 at 10:54?AM Satish Balay <balay at mcs.anl.gov>
>> wrote:
>> > >
>> > >>
>> > >> On Wed, 25 Oct 2023, Qiyue Lu wrote:
>> > >>
>> > >> > Hello,
>> > >> > I have an in-house code enabled OpenMP and it works. Now I am
>> trying to
>> > >> > incorporate PETSc as the linear solver and build together using the
>> > >> > building rules in $PETSC_HOME/lib/petsc/conf/rules. However, I
>> found the
>> > >> > OpenMP part doesn't work anymore.
>> > >>
>> > >> If you are looking at building only your sources with openmp - using
>> > >> petsc formatted makefile [using petsc build rules],
>> > >> you can specify it via CFLAGS - either in makefile - or on command
>> line.
>> > >>
>> > >> >>>>>>>
>> > >> For ex: [this example is using src/ksp/ksp/tutorials/makefile - with
>> the
>> > >> corresponding make fules]
>> > >>
>> > >> [balay at pj01 tutorials]$ make ex2
>> > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
>> > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
>> > >> -fvisibility=hidden -g3 -O0  -I/home/balay/petsc/include
>> > >> -I/home/balay/petsc/arch-linux-c-debug/include
>>  -Wl,-export-dynamic
>> > >> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
>> > >> -L/home/balay/petsc/arch-linux-c-debug/lib
>> > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
>> > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
>> > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm
>> -lX11
>> > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath
>> -lstdc++
>> > >> -lquadmath -o ex2
>> > >> [balay at pj01 tutorials]$ make clean
>> > >> [balay at pj01 tutorials]$ make ex2 CFLAGS=-fopenmp
>> > >> mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
>> > >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
>> > >> -fvisibility=hidden -g3 -O0 -fopenmp -I/home/balay/petsc/include
>> > >> -I/home/balay/petsc/arch-linux-c-debug/include
>>  -Wl,-export-dynamic
>> > >> ex2.c  -Wl,-rpath,/home/balay/petsc/arch-linux-c-debug/lib
>> > >> -L/home/balay/petsc/arch-linux-c-debug/lib
>> > >> -Wl,-rpath,/software/mpich-4.1.1/lib -L/software/mpich-4.1.1/lib
>> > >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/13
>> > >> -L/usr/lib/gcc/x86_64-redhat-linux/13 -lpetsc -llapack -lblas -lm
>> -lX11
>> > >> -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath
>> -lstdc++
>> > >> -lquadmath -o ex2
>> > >> [balay at pj01 tutorials]$
>> > >> <<<<<
>> > >>
>> > >> Satish
>> > >>
>> > >>
>> > >> > Should I re-configure the petsc installation with --with-openmp=1
>> > >> option? I
>> > >> > wonder are the building rules affected by this missing option?
>> > >> >
>> > >> > Thanks,
>> > >> > Qiyue Lu
>> > >> >
>> > >>
>> > >>
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231025/8297841c/attachment.html>

From damynchipman at u.boisestate.edu  Wed Oct 25 14:38:10 2023
From: damynchipman at u.boisestate.edu (Damyn Chipman)
Date: Wed, 25 Oct 2023 13:38:10 -0600
Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators
In-Reply-To: <87h6mfmka5.fsf@jedbrown.org>
References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu>
	<87h6mfmka5.fsf@jedbrown.org>
Message-ID: <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu>

Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries?

-Damyn

> On Oct 24, 2023, at 9:51?PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks.
> 
> That said, you usually have a function that assembles the matrix and you can just call that on the new communicator.
> 
> Damyn Chipman <damynchipman at u.boisestate.edu> writes:
> 
>> Hi PETSc developers,
>> 
>> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another?
>> 
>> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator.
>> 
>> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc.
>> 
>> Thanks in advance,
>> 
>> Damyn Chipman
>> Boise State University
>> PhD Candidate
>> Computational Sciences and Engineering
>> damynchipman at u.boisestate.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231025/093c59cb/attachment-0001.html>

From bsmith at petsc.dev  Wed Oct 25 14:47:09 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 25 Oct 2023 15:47:09 -0400
Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators
In-Reply-To: <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu>
References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu>
	<87h6mfmka5.fsf@jedbrown.org>
	<55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu>
Message-ID: <E627D6B1-59C4-42E1-AB98-329B5F7533CB@petsc.dev>


  If the matrices are stored as dense it is likely new code is the best way to go. 

  What pieces live on the sub communicator? Is it an m by N matrix where m is the number of rows (on that rank) and N is the total number of columns in the final matrix? Or are they smaller "chunks" that need to be combined together?

  Barry


> On Oct 25, 2023, at 3:38?PM, Damyn Chipman <damynchipman at u.boisestate.edu> wrote:
> 
> Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries?
> 
> -Damyn
> 
>> On Oct 24, 2023, at 9:51?PM, Jed Brown <jed at jedbrown.org> wrote:
>> 
>> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks.
>> 
>> That said, you usually have a function that assembles the matrix and you can just call that on the new communicator.
>> 
>> Damyn Chipman <damynchipman at u.boisestate.edu> writes:
>> 
>>> Hi PETSc developers,
>>> 
>>> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another?
>>> 
>>> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator.
>>> 
>>> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc.
>>> 
>>> Thanks in advance,
>>> 
>>> Damyn Chipman
>>> Boise State University
>>> PhD Candidate
>>> Computational Sciences and Engineering
>>> damynchipman at u.boisestate.edu
> 


From damynchipman at u.boisestate.edu  Wed Oct 25 15:38:44 2023
From: damynchipman at u.boisestate.edu (Damyn Chipman)
Date: Wed, 25 Oct 2023 14:38:44 -0600
Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators
In-Reply-To: <E627D6B1-59C4-42E1-AB98-329B5F7533CB@petsc.dev>
References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu>
	<87h6mfmka5.fsf@jedbrown.org>
	<55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu>
	<E627D6B1-59C4-42E1-AB98-329B5F7533CB@petsc.dev>
Message-ID: <0500A373-0274-42C2-B127-5D964D203F44@u.boisestate.edu>

More like smaller pieces that need to be combined. Combining them (merging) means sharing the actual data across a sibling communicator and doing some linear algebra to compute the merged matrices (it involves computing a Schur complement of a combined system from the sibling matrices).

The solver is based on the Hierarchical Poincar?-Steklov (HPS) method, a direct method for solving elliptic PDEs. I had a conversation with Richard at this year?s ATPESC2023 about this idea.

For some more context, here?s the test routine I wrote based on the MatCreateSubMatrix idea. The actual implementation would be part of a recursive merge up a quadtree. Each node's communicator would be a sub-communicator of its parent, and so on. I want to spread the data and compute across any ranks that are involved in that node?s merging. The sizes involved start ?small? at each leaf node (say, no more than 256x256), then are essentially doubled up the tree to the root node.

```
void TEST_petsc_matrix_comm() {

    // Create local matrices on local communicator
    int M_local = 4;
    int N_local = 4;
    Mat mat_local;
    MatCreate(MPI_COMM_SELF, &mat_local); // Note the MPI_COMM_SELF as a substitute for a sub-communicator of MPI_COMM_WORLD
    MatSetSizes(mat_local, PETSC_DECIDE, PETSC_DECIDE, M_local, N_local);
    MatSetFromOptions(mat_local);

    // Set values in local matrix
    int* row_indices = (int*) malloc(M_local*sizeof(int));
    for (int i = 0; i < M_local; i++) {
        row_indices[i] = i;
    }

    int* col_indices = (int*) malloc(N_local*sizeof(int));
    for (int j = 0; j < M_local; j++) {;
        col_indices[j] = j;
    }

    double* values = (double*) malloc(M_local*N_local*sizeof(double));
    int v = M_local*N_local*rank;
    for (int j = 0; j < N_local; j++) {
        for (int i = 0; i < M_local; i++) {
            values[i + j*N_local] = (double) v;
            v++;
        }
    }
    MatSetValues(mat_local, M_local, row_indices, N_local, col_indices, values, INSERT_VALUES);
    MatAssemblyBegin(mat_local, MAT_FINAL_ASSEMBLY);
    MatAssemblyEnd(mat_local, MAT_FINAL_ASSEMBLY);

    // Create local matrices on global communicator
    Mat mat_global;
    IS is_row;
    int idx[4] = {0, 1, 2, 3};
    ISCreateGeneral(MPI_COMM_WORLD, M_local, idx, PETSC_COPY_VALUES, &is_row);
    MatCreateSubMatrix(mat_local, is_row, NULL, MAT_INITIAL_MATRIX, &mat_global);

    // View each local mat on global communicator (sleep for `rank` seconds so output is ordered)
    sleep(rank);
    MatView(mat_global, 0);

    // Create merged mat on global communicator
    //    For this test, I just put the four locally computed matrices on the diagonal of the merged matrix
    //    In the 4-to-1 merge, this would compute T_merged from T_alpha, T_beta, T_gamma, and T_omega (children)
    int M_merged = M_local*size;
    int N_merged = N_local*size;
    Mat mat_merged;
    MatCreate(MPI_COMM_WORLD, &mat_merged);
    MatSetSizes(mat_merged, PETSC_DECIDE, PETSC_DECIDE, M_merged, N_merged);
    MatSetFromOptions(mat_merged);

    // Get values of local matrix to put on diagonal
    double* values_diag = (double*) malloc(M_local*N_local*sizeof(double));
    MatGetValues(mat_global, M_local, row_indices, N_local, col_indices, values_diag);

    // Put local matrix contributions into merged matrix (placeholder for computing merged matrix)
    for (int i = 0; i < M_local; i++) {
        row_indices[i] = i + M_local*rank;
    }

    for (int j = 0; j < N_local; j++) {
        col_indices[j] = j + N_local*rank;
    }
    MatSetValues(mat_merged, M_local, row_indices, N_local, col_indices, values_diag, INSERT_VALUES);
    MatAssemblyBegin(mat_merged, MAT_FINAL_ASSEMBLY);
    MatAssemblyEnd(mat_merged, MAT_FINAL_ASSEMBLY);

    // View merged mat on global communicator
    sleep(rank);
    MatView(mat_merged, 0);

    // Clean up
    free(row_indices);
    free(col_indices);
    free(values);
    free(values_diag);
    MatDestroy(&mat_local);
    MatDestroy(&mat_global);
    MatDestroy(&mat_merged);

}
```

With the following output :

```
(base) ?  mpi git:(feature-parallel) ? mpirun -n 4 ./mpi_matrix
Mat Object: 1 MPI process
  type: seqaij
row 0: (0, 0.)  (1, 1.)  (2, 2.)  (3, 3.) 
row 1: (0, 4.)  (1, 5.)  (2, 6.)  (3, 7.) 
row 2: (0, 8.)  (1, 9.)  (2, 10.)  (3, 11.) 
row 3: (0, 12.)  (1, 13.)  (2, 14.)  (3, 15.) 
Mat Object: 1 MPI process
  type: seqaij
row 0: (0, 16.)  (1, 17.)  (2, 18.)  (3, 19.) 
row 1: (0, 20.)  (1, 21.)  (2, 22.)  (3, 23.) 
row 2: (0, 24.)  (1, 25.)  (2, 26.)  (3, 27.) 
row 3: (0, 28.)  (1, 29.)  (2, 30.)  (3, 31.) 
Mat Object: 1 MPI process
  type: seqaij
row 0: (0, 32.)  (1, 33.)  (2, 34.)  (3, 35.) 
row 1: (0, 36.)  (1, 37.)  (2, 38.)  (3, 39.) 
row 2: (0, 40.)  (1, 41.)  (2, 42.)  (3, 43.) 
row 3: (0, 44.)  (1, 45.)  (2, 46.)  (3, 47.) 
Mat Object: 1 MPI process
  type: seqaij
row 0: (0, 48.)  (1, 49.)  (2, 50.)  (3, 51.) 
row 1: (0, 52.)  (1, 53.)  (2, 54.)  (3, 55.) 
row 2: (0, 56.)  (1, 57.)  (2, 58.)  (3, 59.) 
row 3: (0, 60.)  (1, 61.)  (2, 62.)  (3, 63.) 
Mat Object: 4 MPI processes
  type: mpiaij
row 0: (0, 0.)  (1, 1.)  (2, 2.)  (3, 3.) 
row 1: (0, 4.)  (1, 5.)  (2, 6.)  (3, 7.) 
row 2: (0, 8.)  (1, 9.)  (2, 10.)  (3, 11.) 
row 3: (0, 12.)  (1, 13.)  (2, 14.)  (3, 15.) 
row 4: (4, 16.)  (5, 17.)  (6, 18.)  (7, 19.) 
row 5: (4, 20.)  (5, 21.)  (6, 22.)  (7, 23.) 
row 6: (4, 24.)  (5, 25.)  (6, 26.)  (7, 27.) 
row 7: (4, 28.)  (5, 29.)  (6, 30.)  (7, 31.) 
row 8: (8, 32.)  (9, 33.)  (10, 34.)  (11, 35.) 
row 9: (8, 36.)  (9, 37.)  (10, 38.)  (11, 39.) 
row 10: (8, 40.)  (9, 41.)  (10, 42.)  (11, 43.) 
row 11: (8, 44.)  (9, 45.)  (10, 46.)  (11, 47.) 
row 12: (12, 48.)  (13, 49.)  (14, 50.)  (15, 51.) 
row 13: (12, 52.)  (13, 53.)  (14, 54.)  (15, 55.) 
row 14: (12, 56.)  (13, 57.)  (14, 58.)  (15, 59.) 
row 15: (12, 60.)  (13, 61.)  (14, 62.)  (15, 63.) 
```

-Damyn

> On Oct 25, 2023, at 1:47?PM, Barry Smith <bsmith at petsc.dev> wrote:
> 
> 
>  If the matrices are stored as dense it is likely new code is the best way to go. 
> 
>  What pieces live on the sub communicator? Is it an m by N matrix where m is the number of rows (on that rank) and N is the total number of columns in the final matrix? Or are they smaller "chunks" that need to be combined together?
> 
>  Barry
> 
> 
>> On Oct 25, 2023, at 3:38?PM, Damyn Chipman <damynchipman at u.boisestate.edu> wrote:
>> 
>> Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries?
>> 
>> -Damyn
>> 
>>> On Oct 24, 2023, at 9:51?PM, Jed Brown <jed at jedbrown.org> wrote:
>>> 
>>> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks.
>>> 
>>> That said, you usually have a function that assembles the matrix and you can just call that on the new communicator.
>>> 
>>> Damyn Chipman <damynchipman at u.boisestate.edu> writes:
>>> 
>>>> Hi PETSc developers,
>>>> 
>>>> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another?
>>>> 
>>>> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator.
>>>> 
>>>> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc.
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> Damyn Chipman
>>>> Boise State University
>>>> PhD Candidate
>>>> Computational Sciences and Engineering
>>>> damynchipman at u.boisestate.edu
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231025/d0514f01/attachment-0001.html>

From qiyuelu1 at gmail.com  Thu Oct 26 07:58:14 2023
From: qiyuelu1 at gmail.com (Qiyue Lu)
Date: Thu, 26 Oct 2023 07:58:14 -0500
Subject: [petsc-users] alternative for MatCreateSeqAIJWithArrays
Message-ID: <CALm6fhn+eYiW+RX1tNE4EA+szqFCNtmBA8PSWhTmq21r3-3L+A@mail.gmail.com>

Hello,
I am trying to incorporate PETSc as a linear solver to compute Ax=b in my
code. Currently, the sequential version works.
1) I have the global matrix A in CSR format and they are stored in three
1-dimensional arrays: row_ptr[ ], col_idx[ ], values[ ], and I am using
MatCreateSeqAIJWithArrays to get the PETSc format matrix. This works.
2) I am trying to use multicores, and when I use "srun -n 6", I got the
error *Comm must be of size 1* from the MatCreateSeqAIJWithArrays. Saying I
cannot use SEQ function in a parallel context.
3) I don't think MatCreateMPIAIJWithArrays and MatMPIAIJSetPreallocationCSR
are good options for me, since I already have the global matrix as a whole.

I wonder, from the global CSR format data, how can I reach the PETSc format
matrix for parallel KSP computation. Are the MatSetValue, MatSetValues what
I need?

Thanks,
Qiyue Lu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/1baaab40/attachment.html>

From junchao.zhang at gmail.com  Thu Oct 26 09:08:50 2023
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Thu, 26 Oct 2023 09:08:50 -0500
Subject: [petsc-users] alternative for MatCreateSeqAIJWithArrays
In-Reply-To: <CALm6fhn+eYiW+RX1tNE4EA+szqFCNtmBA8PSWhTmq21r3-3L+A@mail.gmail.com>
References: <CALm6fhn+eYiW+RX1tNE4EA+szqFCNtmBA8PSWhTmq21r3-3L+A@mail.gmail.com>
Message-ID: <CA+MQGp9mdqdFh+cFu0AZLL7qiCFfJRy5s+z6djO-hjGS1qWThw@mail.gmail.com>

On Thu, Oct 26, 2023 at 8:21?AM Qiyue Lu <qiyuelu1 at gmail.com> wrote:

> Hello,
> I am trying to incorporate PETSc as a linear solver to compute Ax=b in my
> code. Currently, the sequential version works.
> 1) I have the global matrix A in CSR format and they are stored in three
> 1-dimensional arrays: row_ptr[ ], col_idx[ ], values[ ], and I am using
> MatCreateSeqAIJWithArrays to get the PETSc format matrix. This works.
> 2) I am trying to use multicores, and when I use "srun -n 6", I got the
> error *Comm must be of size 1* from the MatCreateSeqAIJWithArrays. Saying
> I cannot use SEQ function in a parallel context.
> 3) I don't think MatCreateMPIAIJWithArrays and
> MatMPIAIJSetPreallocationCSR are good options for me, since I already have
> the global matrix as a whole.
>
> I wonder, from the global CSR format data, how can I reach the PETSc
> format matrix for parallel KSP computation. Are the MatSetValue,
> MatSetValues what I need?
>
Yes, MatSetValues on each row.   Your matrix data is originally on one
process, which is not efficient.  You could try to distribute it at the
beginning.


>
> Thanks,
> Qiyue Lu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/ad7e69db/attachment.html>

From bsmith at petsc.dev  Thu Oct 26 09:30:20 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 26 Oct 2023 10:30:20 -0400
Subject: [petsc-users] alternative for MatCreateSeqAIJWithArrays
In-Reply-To: <CALm6fhn+eYiW+RX1tNE4EA+szqFCNtmBA8PSWhTmq21r3-3L+A@mail.gmail.com>
References: <CALm6fhn+eYiW+RX1tNE4EA+szqFCNtmBA8PSWhTmq21r3-3L+A@mail.gmail.com>
Message-ID: <9593F9C1-178E-4CC3-8BFC-2EDAD29ABC05@petsc.dev>


   Is your code sequential (with possibly OpenMP) or MPI parallel? Do you plan to make your part of the code MPI parallel?

    If it is sequential or OpenMP parallel you might consider using the new feature https://petsc.org/release/manualpages/PC/PCMPI/#pcmpi Depending on your system it is an easy way to run linear solver in parallel while the code is sequential and can give some reasonable speedup.

> On Oct 26, 2023, at 8:58?AM, Qiyue Lu <qiyuelu1 at gmail.com> wrote:
> 
> Hello,
> I am trying to incorporate PETSc as a linear solver to compute Ax=b in my code. Currently, the sequential version works. 
> 1) I have the global matrix A in CSR format and they are stored in three 1-dimensional arrays: row_ptr[ ], col_idx[ ], values[ ], and I am using MatCreateSeqAIJWithArrays to get the PETSc format matrix. This works. 
> 2) I am trying to use multicores, and when I use "srun -n 6", I got the error Comm must be of size 1 from the MatCreateSeqAIJWithArrays. Saying I cannot use SEQ function in a parallel context. 
> 3) I don't think MatCreateMPIAIJWithArrays and MatMPIAIJSetPreallocationCSR are good options for me, since I already have the global matrix as a whole. 
> 
> I wonder, from the global CSR format data, how can I reach the PETSc format matrix for parallel KSP computation. Are the MatSetValue, MatSetValues what I need?
> 
> Thanks,
> Qiyue Lu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/53ace381/attachment.html>

From joauma.marichal at uclouvain.be  Thu Oct 26 09:35:37 2023
From: joauma.marichal at uclouvain.be (Joauma Marichal)
Date: Thu, 26 Oct 2023 14:35:37 +0000
Subject: [petsc-users] [petsc-maint] DMSwarm on multiple processors
In-Reply-To: <CAMYG4GkBOyWv=pM-Y3rvr-pfjigTp7WJ-ngj77r=PGZLqeQk4Q@mail.gmail.com>
References: <DU0PR03MB95901B99683E00FF1C38209B81DEA@DU0PR03MB9590.eurprd03.prod.outlook.com>
	<CAMYG4GkBOyWv=pM-Y3rvr-pfjigTp7WJ-ngj77r=PGZLqeQk4Q@mail.gmail.com>
Message-ID: <DU0PR03MB9590F6FBF474F3AD7C10EEE981DDA@DU0PR03MB9590.eurprd03.prod.outlook.com>

Hello,


Here is a very simple version where I have issues.


Which I run as follows:


cd Grid_generation

make clean

make all

./grid_generation

cd ..

make clean

make all

./cobpor # on 1 proc

# OR

mpiexec ./cobpor -ksp_type cg -pc_type pfmg -dm_mat_type hyprestruct -pc_pfmg_skip_relax 1 -pc_pfmg_rap_time non-Galerkin # on multiple procs


The error that I get is the following:

munmap_chunk(): invalid pointer

[cns266:2552391] *** Process received signal ***

[cns266:2552391] Signal: Aborted (6)

[cns266:2552391] Signal code:  (-6)

[cns266:2552391] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7fd7fd194b20]

[cns266:2552391] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7fd7fd194a9f]

[cns266:2552391] [ 2] /lib64/libc.so.6(abort+0x127)[0x7fd7fd167e05]

[cns266:2552391] [ 3] /lib64/libc.so.6(+0x91037)[0x7fd7fd1d7037]

[cns266:2552391] [ 4] /lib64/libc.so.6(+0x9819c)[0x7fd7fd1de19c]

[cns266:2552391] [ 5] /lib64/libc.so.6(+0x9844c)[0x7fd7fd1de44c]

[cns266:2552391] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscFreeAlign+0xe)[0x7fd7fe63d50e]

[cns266:2552391] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetMatType+0x3d)[0x7fd7feab87ad]

[cns266:2552391] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetFromOptions+0x109)[0x7fd7feab8b59]

[cns266:2552391] [ 9] ./cobpor[0x402df9]

[cns266:2552391] [10] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7fd7fd180cf3]

[cns266:2552391] [11] ./cobpor[0x40304e]

[cns266:2552391] *** End of error message ***


Thanks a lot for your help.


Best regards,


Joauma


De : Matthew Knepley <knepley at gmail.com>
Date : mercredi, 25 octobre 2023 ? 14:45
? : Joauma Marichal <joauma.marichal at uclouvain.be>
Cc : petsc-maint at mcs.anl.gov <petsc-maint at mcs.anl.gov>, petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Objet : Re: [petsc-maint] DMSwarm on multiple processors
On Wed, Oct 25, 2023 at 8:32?AM Joauma Marichal via petsc-maint <petsc-maint at mcs.anl.gov<mailto:petsc-maint at mcs.anl.gov>> wrote:
Hello,

I am using the DMSwarm library in some Eulerian-Lagrangian approach to have vapor bubbles in water.
I have obtained nice results recently and wanted to perform bigger simulations. Unfortunately, when I increase the number of processors used to run the simulation, I get the following error:


free(): invalid size

[cns136:590327] *** Process received signal ***

[cns136:590327] Signal: Aborted (6)

[cns136:590327] Signal code:  (-6)

[cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20]

[cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f]

[cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05]

[cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037]

[cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c]

[cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac]

[cns136:590327] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]

[cns136:590327] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642]

[cns136:590327] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]

[cns136:590327] [ 9] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]

[cns136:590327] [10] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]

[cns136:590327] [11] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]

[cns136:590327] [12] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20]

[cns136:590327] [13] ./cobpor[0x4418dc]

[cns136:590327] [14] ./cobpor[0x408b63]

[cns136:590327] [15] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]

[cns136:590327] [16] ./cobpor[0x40bdee]

[cns136:590327] *** End of error message ***

--------------------------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--------------------------------------------------------------------------

--------------------------------------------------------------------------

mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on signal 6 (Aborted).

--------------------------------------------------------------------------

When I reduce the number of processors the error disappears and when I run my code without the vapor bubbles it also works.
The problem seems to take place at this moment:

DMCreate(PETSC_COMM_WORLD,swarm);
    DMSetType(*swarm,DMSWARM);
    DMSetDimension(*swarm,3);
    DMSwarmSetType(*swarm,DMSWARM_PIC);
    DMSwarmSetCellDM(*swarm,*dmcell);


Thanks a lot for your help.

Things that would help us track this down:

1) The smallest example where it fails

2) The smallest number of processes where it fails

3) A stack trace of the failure

4) A simple example that we can run that also fails

  Thanks,

     Matt

Best regards,

Joauma


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/86d305fe/attachment-0001.html>

From joauma.marichal at uclouvain.be  Thu Oct 26 09:36:31 2023
From: joauma.marichal at uclouvain.be (Joauma Marichal)
Date: Thu, 26 Oct 2023 14:36:31 +0000
Subject: [petsc-users] =?iso-8859-1?q?Joauma_Marichal_a_partag=E9_le_doss?=
 =?iso-8859-1?q?ier_=AB=A0marha=A0=BB_avec_vous?=
Message-ID: <odspmicro-Share-72e9e7a0-f025-7000-87f5-8f5cff0d9ac2-d172d72c-4c71-40dc-be73-686a9b2c5375-SendEmail-UpdateActivity-PreprocessPayload@E85B868293C7>

[Partager l'image]

Joauma Marichal a partag? un dossier avec vous

Joauma Marichal a partag? ce dossier avec vous.

<https://uclouvain-my.sharepoint.com:443/:f:/g/personal/joauma_marichal_uclouvain_be/ErUjDNbGk5BPlk7j7Ny3QJsBHOKw1f_aZBFPOOUJoUL73Q?e=5%3ah2LN99&fromShare=true&at=9>
[icon]  marha
[permission globe icon]         Ce lien ne fonctionne que pour les destinataires directs de ce message.
Ouvrir <https://uclouvain-my.sharepoint.com:443/:f:/g/personal/joauma_marichal_uclouvain_be/ErUjDNbGk5BPlk7j7Ny3QJsBHOKw1f_aZBFPOOUJoUL73Q?e=5%3ah2LN99&fromShare=true&at=9>
[Microsoft logo]        [cid:faf45f49-2eb0-45c1-831d-d87e9a739e5c]
D?claration de confidentialit? <https://aka.ms/privacy\>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/a02466b1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AttachedImage
Type: image/png
Size: 2877 bytes
Desc: AttachedImage
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/a02466b1/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AttachedImage
Type: image/png
Size: 560 bytes
Desc: AttachedImage
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/a02466b1/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AttachedImage
Type: image/png
Size: 2133 bytes
Desc: AttachedImage
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/a02466b1/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AttachedImage
Type: image/png
Size: 5135 bytes
Desc: AttachedImage
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/a02466b1/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AttachedImage
Type: image/png
Size: 3404 bytes
Desc: AttachedImage
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/a02466b1/attachment-0009.png>

From bsmith at petsc.dev  Thu Oct 26 09:58:55 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 26 Oct 2023 10:58:55 -0400
Subject: [petsc-users] [petsc-maint] DMSwarm on multiple processors
In-Reply-To: <DU0PR03MB9590F6FBF474F3AD7C10EEE981DDA@DU0PR03MB9590.eurprd03.prod.outlook.com>
References: <DU0PR03MB95901B99683E00FF1C38209B81DEA@DU0PR03MB9590.eurprd03.prod.outlook.com>
	<CAMYG4GkBOyWv=pM-Y3rvr-pfjigTp7WJ-ngj77r=PGZLqeQk4Q@mail.gmail.com>
	<DU0PR03MB9590F6FBF474F3AD7C10EEE981DDA@DU0PR03MB9590.eurprd03.prod.outlook.com>
Message-ID: <595BCCE5-C4DD-4B7E-B7E7-9D07961DD24D@petsc.dev>


   Please run with -malloc_debug option or even better run under Valgrind https://petsc.org/release/faq/


> On Oct 26, 2023, at 10:35?AM, Joauma Marichal via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Hello, 
>  
> Here is a very simple version where I have issues.
>  
> Which I run as follows:
>  
> cd Grid_generation 
> make clean 
> make all
> ./grid_generation 
> cd ..
> make clean 
> make all
> ./cobpor # on 1 proc
> # OR
> mpiexec ./cobpor -ksp_type cg -pc_type pfmg -dm_mat_type hyprestruct -pc_pfmg_skip_relax 1 -pc_pfmg_rap_time non-Galerkin # on multiple procs
>  
> The error that I get is the following:
> munmap_chunk(): invalid pointer
> [cns266:2552391] *** Process received signal ***
> [cns266:2552391] Signal: Aborted (6)
> [cns266:2552391] Signal code:  (-6)
> [cns266:2552391] [ 0] /lib64/libc.so <http://libc.so/>.6(+0x4eb20)[0x7fd7fd194b20]
> [cns266:2552391] [ 1] /lib64/libc.so <http://libc.so/>.6(gsignal+0x10f)[0x7fd7fd194a9f]
> [cns266:2552391] [ 2] /lib64/libc.so <http://libc.so/>.6(abort+0x127)[0x7fd7fd167e05]
> [cns266:2552391] [ 3] /lib64/libc.so <http://libc.so/>.6(+0x91037)[0x7fd7fd1d7037]
> [cns266:2552391] [ 4] /lib64/libc.so <http://libc.so/>.6(+0x9819c)[0x7fd7fd1de19c]
> [cns266:2552391] [ 5] /lib64/libc.so <http://libc.so/>.6(+0x9844c)[0x7fd7fd1de44c]
> [cns266:2552391] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(PetscFreeAlign+0xe)[0x7fd7fe63d50e]
> [cns266:2552391] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(DMSetMatType+0x3d)[0x7fd7feab87ad]
> [cns266:2552391] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(DMSetFromOptions+0x109)[0x7fd7feab8b59]
> [cns266:2552391] [ 9] ./cobpor[0x402df9]
> [cns266:2552391] [10] /lib64/libc.so <http://libc.so/>.6(__libc_start_main+0xf3)[0x7fd7fd180cf3]
> [cns266:2552391] [11] ./cobpor[0x40304e]
> [cns266:2552391] *** End of error message ***
>  
>  
> Thanks a lot for your help. 
>  
> Best regards, 
>  
> Joauma
>  
>  
>  
> De : Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
> Date : mercredi, 25 octobre 2023 ? 14:45
> ? : Joauma Marichal <joauma.marichal at uclouvain.be <mailto:joauma.marichal at uclouvain.be>>
> Cc : petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov> <petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov>>, petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Objet : Re: [petsc-maint] DMSwarm on multiple processors
> 
> On Wed, Oct 25, 2023 at 8:32?AM Joauma Marichal via petsc-maint <petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov>> wrote:
> Hello, 
>  
> I am using the DMSwarm library in some Eulerian-Lagrangian approach to have vapor bubbles in water. 
> I have obtained nice results recently and wanted to perform bigger simulations. Unfortunately, when I increase the number of processors used to run the simulation, I get the following error:
>  
> free(): invalid size
> 
> [cns136:590327] *** Process received signal ***
> 
> [cns136:590327] Signal: Aborted (6)
> 
> [cns136:590327] Signal code:  (-6)
> 
> [cns136:590327] [ 0] /lib64/libc.so <http://libc.so/>.6(+0x4eb20)[0x7f56cd4c9b20]
> 
> [cns136:590327] [ 1] /lib64/libc.so <http://libc.so/>.6(gsignal+0x10f)[0x7f56cd4c9a9f]
> 
> [cns136:590327] [ 2] /lib64/libc.so <http://libc.so/>.6(abort+0x127)[0x7f56cd49ce05]
> 
> [cns136:590327] [ 3] /lib64/libc.so <http://libc.so/>.6(+0x91037)[0x7f56cd50c037]
> 
> [cns136:590327] [ 4] /lib64/libc.so <http://libc.so/>.6(+0x9819c)[0x7f56cd51319c]
> 
> [cns136:590327] [ 5] /lib64/libc.so <http://libc.so/>.6(+0x99aac)[0x7f56cd514aac]
> 
> [cns136:590327] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]
> 
> [cns136:590327] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(+0x841642)[0x7f56cea83642]
> 
> [cns136:590327] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]
> 
> [cns136:590327] [ 9] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]
> 
> [cns136:590327] [10] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]
> 
> [cns136:590327] [11] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]
> 
> [cns136:590327] [12] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(DMSetUp+0x20)[0x7f56cededa20]
> 
> [cns136:590327] [13] ./cobpor[0x4418dc]
> 
> [cns136:590327] [14] ./cobpor[0x408b63]
> 
> [cns136:590327] [15] /lib64/libc.so <http://libc.so/>.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]
> 
> [cns136:590327] [16] ./cobpor[0x40bdee]
> 
> [cns136:590327] *** End of error message ***
> 
> --------------------------------------------------------------------------
> 
> Primary job  terminated normally, but 1 process returned
> 
> a non-zero exit code. Per user-direction, the job has been aborted.
> 
> --------------------------------------------------------------------------
> 
> --------------------------------------------------------------------------
> 
> mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on signal 6 (Aborted).
> 
> --------------------------------------------------------------------------
> 
>  
> When I reduce the number of processors the error disappears and when I run my code without the vapor bubbles it also works.
> The problem seems to take place at this moment:
>  
> DMCreate(PETSC_COMM_WORLD,swarm);
>     DMSetType(*swarm,DMSWARM);
>     DMSetDimension(*swarm,3);
>     DMSwarmSetType(*swarm,DMSWARM_PIC);
>     DMSwarmSetCellDM(*swarm,*dmcell);
>  
>  
> Thanks a lot for your help. 
>  
> Things that would help us track this down:
>  
> 1) The smallest example where it fails
>  
> 2) The smallest number of processes where it fails
>  
> 3) A stack trace of the failure
>  
> 4) A simple example that we can run that also fails
>  
>   Thanks,
>  
>      Matt
>  
> Best regards, 
>  
> Joauma
> 
>  
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>  
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/e322706f/attachment-0001.html>

From knepley at gmail.com  Thu Oct 26 11:01:42 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 26 Oct 2023 12:01:42 -0400
Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators
In-Reply-To: <55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu>
References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu>
	<87h6mfmka5.fsf@jedbrown.org>
	<55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu>
Message-ID: <CAMYG4GkB9OFZzbAyvNr+xD+HQVsegPrKK2BjjnqNoQ6vYLsj_A@mail.gmail.com>

On Wed, Oct 25, 2023 at 11:55?PM Damyn Chipman <
damynchipman at u.boisestate.edu> wrote:

> Great thanks, that seemed to work well. This is something my algorithm
> will do fairly often (?elevating? a node?s communicator to a communicator
> that includes siblings). The matrices formed are dense but low rank. With
> MatCreateSubMatrix, it appears I do a lot of copying from one Mat to
> another. Is there a way to do it with array copying or pointer movement
> instead of copying entries?
>

We could make a fast path for dense that avoids MatSetValues(). Can you
make an issue for this? The number one thing that would make this faster is
to contribute a small test. Then we could run it continually when putting
in the fast path to make sure we are preserving correctness.

  Thanks,

    Matt


> -Damyn
>
> On Oct 24, 2023, at 9:51?PM, Jed Brown <jed at jedbrown.org> wrote:
>
> You can place it in a parallel Mat (that has rows or columns on only one
> rank or a subset of ranks) and then MatCreateSubMatrix with all new
> rows/columns on a different rank or subset of ranks.
>
> That said, you usually have a function that assembles the matrix and you
> can just call that on the new communicator.
>
> Damyn Chipman <damynchipman at u.boisestate.edu> writes:
>
> Hi PETSc developers,
>
> In short, my question is this: Does PETSc provide a way to move or copy an
> object (say a Mat) from one communicator to another?
>
> The more detailed scenario is this: I?m working on a linear algebra solver
> on quadtree meshes (i.e., p4est). I use communicator subsets in order to
> facilitate communication between siblings or nearby neighbors. When
> performing linear algebra across siblings (a group of 4), I need to copy a
> node?s data (i.e., a Mat object) from a sibling?s communicator to the
> communicator that includes the four siblings. From what I can tell, I can
> only copy a PETSc object onto the same communicator.
>
> My current approach will be to copy the raw data from the Mat on one
> communicator to a new Mat on the new communicator, but I wanted to see if
> there is a more ?elegant? approach within PETSc.
>
> Thanks in advance,
>
> Damyn Chipman
> Boise State University
> PhD Candidate
> Computational Sciences and Engineering
> damynchipman at u.boisestate.edu
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/3ac37ce9/attachment.html>

From knepley at gmail.com  Thu Oct 26 14:34:49 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 26 Oct 2023 15:34:49 -0400
Subject: [petsc-users] [petsc-maint] DMSwarm on multiple processors
In-Reply-To: <595BCCE5-C4DD-4B7E-B7E7-9D07961DD24D@petsc.dev>
References: <DU0PR03MB95901B99683E00FF1C38209B81DEA@DU0PR03MB9590.eurprd03.prod.outlook.com>
	<CAMYG4GkBOyWv=pM-Y3rvr-pfjigTp7WJ-ngj77r=PGZLqeQk4Q@mail.gmail.com>
	<DU0PR03MB9590F6FBF474F3AD7C10EEE981DDA@DU0PR03MB9590.eurprd03.prod.outlook.com>
	<595BCCE5-C4DD-4B7E-B7E7-9D07961DD24D@petsc.dev>
Message-ID: <CAMYG4G=qUfw4EQcUMm1DHUk0QvLwGF4wNX+UoF8Hz78CPUdp3Q@mail.gmail.com>

Okay, there were a few problems:

1) You overwrote the bounds on string loc_grid_gen[]

2) You destroyed the coordinate DA

I fixed these and it runs for me fine on several processes. I am including
my revised source since I check a lot more error values. I converted it to
C because that is easier for me, although C has a problem with your sqrt()
in a compile-time constant.

  Thanks,

     Matt

On Thu, Oct 26, 2023 at 10:59?AM Barry Smith <bsmith at petsc.dev> wrote:

>
>    Please run with -malloc_debug option or even better run under Valgrind
> https://petsc.org/release/faq/
>
>
>
> On Oct 26, 2023, at 10:35?AM, Joauma Marichal via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hello,
>
> Here is a very simple version where I have issues.
>
> Which I run as follows:
>
> cd Grid_generation
> make clean
> make all
> ./grid_generation
> cd ..
> make clean
> make all
> ./cobpor # on 1 proc
> # OR
> mpiexec ./cobpor -ksp_type cg -pc_type pfmg -dm_mat_type hyprestruct
> -pc_pfmg_skip_relax 1 -pc_pfmg_rap_time non-Galerkin # on multiple procs
>
> The error that I get is the following:
> munmap_chunk(): invalid pointer
> [cns266:2552391] *** Process received signal ***
> [cns266:2552391] Signal: Aborted (6)
> [cns266:2552391] Signal code:  (-6)
> [cns266:2552391] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7fd7fd194b20]
> [cns266:2552391] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7fd7fd194a9f]
> [cns266:2552391] [ 2] /lib64/libc.so.6(abort+0x127)[0x7fd7fd167e05]
> [cns266:2552391] [ 3] /lib64/libc.so.6(+0x91037)[0x7fd7fd1d7037]
> [cns266:2552391] [ 4] /lib64/libc.so.6(+0x9819c)[0x7fd7fd1de19c]
> [cns266:2552391] [ 5] /lib64/libc.so.6(+0x9844c)[0x7fd7fd1de44c]
> [cns266:2552391] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/
> libpetsc.so.3.019(PetscFreeAlign+0xe)[0x7fd7fe63d50e]
> [cns266:2552391] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/
> libpetsc.so.3.019(DMSetMatType+0x3d)[0x7fd7feab87ad]
> [cns266:2552391] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/
> libpetsc.so.3.019(DMSetFromOptions+0x109)[0x7fd7feab8b59]
> [cns266:2552391] [ 9] ./cobpor[0x402df9]
> [cns266:2552391] [10] /lib64/libc.so
> .6(__libc_start_main+0xf3)[0x7fd7fd180cf3]
> [cns266:2552391] [11] ./cobpor[0x40304e]
> [cns266:2552391] *** End of error message ***
>
>
> Thanks a lot for your help.
>
> Best regards,
>
> Joauma
>
>
>
>
> *De : *Matthew Knepley <knepley at gmail.com>
> *Date : *mercredi, 25 octobre 2023 ? 14:45
> *? : *Joauma Marichal <joauma.marichal at uclouvain.be>
> *Cc : *petsc-maint at mcs.anl.gov <petsc-maint at mcs.anl.gov>,
> petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Objet : *Re: [petsc-maint] DMSwarm on multiple processors
> On Wed, Oct 25, 2023 at 8:32?AM Joauma Marichal via petsc-maint <
> petsc-maint at mcs.anl.gov> wrote:
>
> Hello,
>
> I am using the DMSwarm library in some Eulerian-Lagrangian approach to
> have vapor bubbles in water.
> I have obtained nice results recently and wanted to perform bigger
> simulations. Unfortunately, when I increase the number of processors used
> to run the simulation, I get the following error:
>
>
> free(): invalid size
>
> [cns136:590327] *** Process received signal ***
>
> [cns136:590327] Signal: Aborted (6)
>
> [cns136:590327] Signal code:  (-6)
>
> [cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20]
>
> [cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f]
>
> [cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05]
>
> [cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037]
>
> [cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c]
>
> [cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac]
>
> [cns136:590327] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/
> libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]
>
> [cns136:590327] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/
> libpetsc.so.3.019(+0x841642)[0x7f56cea83642]
>
> [cns136:590327] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/
> libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]
>
> [cns136:590327] [ 9] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/
> libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]
>
> [cns136:590327] [10] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/
> libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]
>
> [cns136:590327] [11] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/
> libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]
>
> [cns136:590327] [12] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/
> libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20]
>
> [cns136:590327] [13] ./cobpor[0x4418dc]
>
> [cns136:590327] [14] ./cobpor[0x408b63]
>
> [cns136:590327] [15] /lib64/libc.so
> .6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]
>
> [cns136:590327] [16] ./cobpor[0x40bdee]
>
> [cns136:590327] *** End of error message ***
>
> --------------------------------------------------------------------------
>
> Primary job  terminated normally, but 1 process returned
>
> a non-zero exit code. Per user-direction, the job has been aborted.
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>
> mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited
> on signal 6 (Aborted).
>
> --------------------------------------------------------------------------
>
> When I reduce the number of processors the error disappears and when I run
> my code without the vapor bubbles it also works.
> The problem seems to take place at this moment:
>
> DMCreate(PETSC_COMM_WORLD,swarm);
>     DMSetType(*swarm,DMSWARM);
>     DMSetDimension(*swarm,3);
>     DMSwarmSetType(*swarm,DMSWARM_PIC);
>     DMSwarmSetCellDM(*swarm,*dmcell);
>
>
> Thanks a lot for your help.
>
>
> Things that would help us track this down:
>
> 1) The smallest example where it fails
>
> 2) The smallest number of processes where it fails
>
> 3) A stack trace of the failure
>
> 4) A simple example that we can run that also fails
>
>   Thanks,
>
>      Matt
>
>
> Best regards,
>
> Joauma
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/af159afd/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: global.h
Type: application/octet-stream
Size: 11245 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/af159afd/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: global_evap.h
Type: application/octet-stream
Size: 1099 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/af159afd/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.c
Type: application/octet-stream
Size: 16931 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/af159afd/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makefile_matt
Type: application/octet-stream
Size: 135 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/af159afd/attachment-0007.obj>

From damian.marek at mail.utoronto.ca  Fri Oct 27 08:12:34 2023
From: damian.marek at mail.utoronto.ca (Damian Marek)
Date: Fri, 27 Oct 2023 13:12:34 +0000
Subject: [petsc-users] MatDenseSetLDA Documentation Clarification
Message-ID: <YT1PR01MB88769686501648D24D61524FA7DCA@YT1PR01MB8876.CANPRD01.PROD.OUTLOOK.COM>

Hello,

I found a minor issue with the documentation for MatDenseSetLDA, which ended up causing my program to idle unexpectedly. In the documentation it is specified as "Not Collective". However, in the MPIDense implementation it is possible for PetscLayoutSetUp to be called, which is a collective. (Lines 201-202: https://petsc.org/main/src/mat/impls/dense/mpi/mpidense.c.html#MatDenseSetLDA_MPIDense)

Regards,
Damian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231027/4a4e5bd0/attachment.html>

From bsmith at petsc.dev  Fri Oct 27 10:11:59 2023
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 27 Oct 2023 11:11:59 -0400
Subject: [petsc-users] MatDenseSetLDA Documentation Clarification
In-Reply-To: <YT1PR01MB88769686501648D24D61524FA7DCA@YT1PR01MB8876.CANPRD01.PROD.OUTLOOK.COM>
References: <YT1PR01MB88769686501648D24D61524FA7DCA@YT1PR01MB8876.CANPRD01.PROD.OUTLOOK.COM>
Message-ID: <61F224FC-8C7E-430D-A00C-FC0167C54F38@petsc.dev>


   Damian,

    Thanks for the report. Fixing in https://gitlab.com/petsc/petsc/-/merge_requests/6972

  Barry


> On Oct 27, 2023, at 9:12?AM, Damian Marek <damian.marek at mail.utoronto.ca> wrote:
> 
> Hello,
> 
> I found a minor issue with the documentation for MatDenseSetLDA, which ended up causing my program to idle unexpectedly. In the documentation it is specified as "Not Collective". However, in the MPIDense implementation it is possible for PetscLayoutSetUp to be called, which is a collective. (Lines 201-202: https://petsc.org/main/src/mat/impls/dense/mpi/mpidense.c.html#MatDenseSetLDA_MPIDense)
> 
> Regards,
> Damian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231027/a1530fcc/attachment.html>

From damynchipman at u.boisestate.edu  Fri Oct 27 14:53:56 2023
From: damynchipman at u.boisestate.edu (Damyn Chipman)
Date: Fri, 27 Oct 2023 13:53:56 -0600
Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators
In-Reply-To: <CAMYG4GkB9OFZzbAyvNr+xD+HQVsegPrKK2BjjnqNoQ6vYLsj_A@mail.gmail.com>
References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu>
	<87h6mfmka5.fsf@jedbrown.org>
	<55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu>
	<CAMYG4GkB9OFZzbAyvNr+xD+HQVsegPrKK2BjjnqNoQ6vYLsj_A@mail.gmail.com>
Message-ID: <53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu>

Yeah, I?ll make an issue and use a modified version of this test routine.

Does anything change if I will be using MATSCALAPACK matrices instead of the built in MATDENSE? Like I said, I will be computing Schur complements and need to use a parallel and dense matrix format.

-Damyn

> On Oct 26, 2023, at 10:01?AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Wed, Oct 25, 2023 at 11:55?PM Damyn Chipman <damynchipman at u.boisestate.edu <mailto:damynchipman at u.boisestate.edu>> wrote:
>> Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries?
> 
> We could make a fast path for dense that avoids MatSetValues(). Can you make an issue for this? The number one thing that would make this faster is to contribute a small test. Then we could run it continually when putting in the fast path to make sure we are preserving correctness.
> 
>   Thanks,
> 
>     Matt
>  
>> -Damyn
>> 
>>> On Oct 24, 2023, at 9:51?PM, Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>> wrote:
>>> 
>>> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks.
>>> 
>>> That said, you usually have a function that assembles the matrix and you can just call that on the new communicator.
>>> 
>>> Damyn Chipman <damynchipman at u.boisestate.edu <mailto:damynchipman at u.boisestate.edu>> writes:
>>> 
>>>> Hi PETSc developers,
>>>> 
>>>> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another?
>>>> 
>>>> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator.
>>>> 
>>>> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc.
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> Damyn Chipman
>>>> Boise State University
>>>> PhD Candidate
>>>> Computational Sciences and Engineering
>>>> damynchipman at u.boisestate.edu <mailto:damynchipman at u.boisestate.edu>
>> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231027/d875bc1f/attachment.html>

From jroman at dsic.upv.es  Sat Oct 28 02:35:33 2023
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Sat, 28 Oct 2023 09:35:33 +0200
Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators
In-Reply-To: <53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu>
References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu>
	<87h6mfmka5.fsf@jedbrown.org>
	<55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu>
	<CAMYG4GkB9OFZzbAyvNr+xD+HQVsegPrKK2BjjnqNoQ6vYLsj_A@mail.gmail.com>
	<53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu>
Message-ID: <7281ED63-0D4B-4A56-97DA-40781D27D856@dsic.upv.es>

Currently MATSCALAPACK does not support MatCreateSubMatrix(). I guess it would not be difficult to implement.

Jose


> El 27 oct 2023, a las 21:53, Damyn Chipman <damynchipman at u.boisestate.edu> escribi?:
> 
> Yeah, I?ll make an issue and use a modified version of this test routine.
> 
> Does anything change if I will be using MATSCALAPACK matrices instead of the built in MATDENSE? Like I said, I will be computing Schur complements and need to use a parallel and dense matrix format.
> 
> -Damyn
> 
>> On Oct 26, 2023, at 10:01?AM, Matthew Knepley <knepley at gmail.com> wrote:
>> 
>> On Wed, Oct 25, 2023 at 11:55?PM Damyn Chipman <damynchipman at u.boisestate.edu> wrote:
>> Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries?
>> 
>> We could make a fast path for dense that avoids MatSetValues(). Can you make an issue for this? The number one thing that would make this faster is to contribute a small test. Then we could run it continually when putting in the fast path to make sure we are preserving correctness.
>> 
>>   Thanks,
>> 
>>     Matt
>>  
>> -Damyn
>> 
>>> On Oct 24, 2023, at 9:51?PM, Jed Brown <jed at jedbrown.org> wrote:
>>> 
>>> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks.
>>> 
>>> That said, you usually have a function that assembles the matrix and you can just call that on the new communicator.
>>> 
>>> Damyn Chipman <damynchipman at u.boisestate.edu> writes:
>>> 
>>>> Hi PETSc developers,
>>>> 
>>>> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another?
>>>> 
>>>> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator.
>>>> 
>>>> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc.
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> Damyn Chipman
>>>> Boise State University
>>>> PhD Candidate
>>>> Computational Sciences and Engineering
>>>> damynchipman at u.boisestate.edu
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/
> 


From qiyuelu1 at gmail.com  Sat Oct 28 09:20:16 2023
From: qiyuelu1 at gmail.com (Qiyue Lu)
Date: Sat, 28 Oct 2023 09:20:16 -0500
Subject: [petsc-users] alternative for MatCreateSeqAIJWithArrays
In-Reply-To: <9593F9C1-178E-4CC3-8BFC-2EDAD29ABC05@petsc.dev>
References: <CALm6fhn+eYiW+RX1tNE4EA+szqFCNtmBA8PSWhTmq21r3-3L+A@mail.gmail.com>
	<9593F9C1-178E-4CC3-8BFC-2EDAD29ABC05@petsc.dev>
Message-ID: <CALm6fhn9nrN-uCWBRXQiujnHBZxVScafGsHAx3hP_rE_Oe5cpQ@mail.gmail.com>

Yes, this is exactly what I need. And it works now. For the record to other
potential users:
1) PetscInitialize and PCMPIServerBegin
looping start
         2) Sequential code part, and as Junchao said, MatSetValues on each
row to create matrices.
         3) parallel KSPSolve
looping end
4) PCMPIServerEnd and PetscFinalize

Thank you all for these suggestions. Have a good weekend.

Qiyue Lu

On Thu, Oct 26, 2023 at 9:30?AM Barry Smith <bsmith at petsc.dev> wrote:

>
>    Is your code sequential (with possibly OpenMP) or MPI parallel? Do you
> plan to make your part of the code MPI parallel?
>
>     If it is sequential or OpenMP parallel you might consider using the
> new feature https://petsc.org/release/manualpages/PC/PCMPI/#pcmpi Depending
> on your system it is an easy way to run linear solver in parallel while the
> code is sequential and can give some reasonable speedup.
>
> On Oct 26, 2023, at 8:58?AM, Qiyue Lu <qiyuelu1 at gmail.com> wrote:
>
> Hello,
> I am trying to incorporate PETSc as a linear solver to compute Ax=b in my
> code. Currently, the sequential version works.
> 1) I have the global matrix A in CSR format and they are stored in three
> 1-dimensional arrays: row_ptr[ ], col_idx[ ], values[ ], and I am using
> MatCreateSeqAIJWithArrays to get the PETSc format matrix. This works.
> 2) I am trying to use multicores, and when I use "srun -n 6", I got the
> error *Comm must be of size 1* from the MatCreateSeqAIJWithArrays. Saying
> I cannot use SEQ function in a parallel context.
> 3) I don't think MatCreateMPIAIJWithArrays and
> MatMPIAIJSetPreallocationCSR are good options for me, since I already have
> the global matrix as a whole.
>
> I wonder, from the global CSR format data, how can I reach the PETSc
> format matrix for parallel KSP computation. Are the MatSetValue,
> MatSetValues what I need?
>
> Thanks,
> Qiyue Lu
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231028/4f8c33ad/attachment-0001.html>

From knepley at gmail.com  Sat Oct 28 09:56:47 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 28 Oct 2023 10:56:47 -0400
Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators
In-Reply-To: <53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu>
References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu>
	<87h6mfmka5.fsf@jedbrown.org>
	<55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu>
	<CAMYG4GkB9OFZzbAyvNr+xD+HQVsegPrKK2BjjnqNoQ6vYLsj_A@mail.gmail.com>
	<53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu>
Message-ID: <CAMYG4GmvVWp46Z8LoqqqUBt8uXd0yNyCXYLpmLHdhZRH4qY+Vw@mail.gmail.com>

On Fri, Oct 27, 2023 at 3:54?PM Damyn Chipman <damynchipman at u.boisestate.edu>
wrote:

> Yeah, I?ll make an issue and use a modified version of this test routine.
>
> Does anything change if I will be using MATSCALAPACK matrices instead of
> the built in MATDENSE?
>

No, that is likely worse.


> Like I said, I will be computing Schur complements and need to use a
> parallel and dense matrix format.
>

I do not understand the communication pattern, but it is possible that
Elemental would be slightly faster since it has some cool built-in
communication operations, however it might be more programming.

  Thanks,

     Matt


> -Damyn
>
> On Oct 26, 2023, at 10:01?AM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Wed, Oct 25, 2023 at 11:55?PM Damyn Chipman <
> damynchipman at u.boisestate.edu> wrote:
>
>> Great thanks, that seemed to work well. This is something my algorithm
>> will do fairly often (?elevating? a node?s communicator to a communicator
>> that includes siblings). The matrices formed are dense but low rank. With
>> MatCreateSubMatrix, it appears I do a lot of copying from one Mat to
>> another. Is there a way to do it with array copying or pointer movement
>> instead of copying entries?
>>
>
> We could make a fast path for dense that avoids MatSetValues(). Can you
> make an issue for this? The number one thing that would make this faster is
> to contribute a small test. Then we could run it continually when putting
> in the fast path to make sure we are preserving correctness.
>
>   Thanks,
>
>     Matt
>
>
>> -Damyn
>>
>> On Oct 24, 2023, at 9:51?PM, Jed Brown <jed at jedbrown.org> wrote:
>>
>> You can place it in a parallel Mat (that has rows or columns on only one
>> rank or a subset of ranks) and then MatCreateSubMatrix with all new
>> rows/columns on a different rank or subset of ranks.
>>
>> That said, you usually have a function that assembles the matrix and you
>> can just call that on the new communicator.
>>
>> Damyn Chipman <damynchipman at u.boisestate.edu> writes:
>>
>> Hi PETSc developers,
>>
>> In short, my question is this: Does PETSc provide a way to move or copy
>> an object (say a Mat) from one communicator to another?
>>
>> The more detailed scenario is this: I?m working on a linear algebra
>> solver on quadtree meshes (i.e., p4est). I use communicator subsets in
>> order to facilitate communication between siblings or nearby neighbors.
>> When performing linear algebra across siblings (a group of 4), I need to
>> copy a node?s data (i.e., a Mat object) from a sibling?s communicator to
>> the communicator that includes the four siblings. From what I can tell, I
>> can only copy a PETSc object onto the same communicator.
>>
>> My current approach will be to copy the raw data from the Mat on one
>> communicator to a new Mat on the new communicator, but I wanted to see if
>> there is a more ?elegant? approach within PETSc.
>>
>> Thanks in advance,
>>
>> Damyn Chipman
>> Boise State University
>> PhD Candidate
>> Computational Sciences and Engineering
>> damynchipman at u.boisestate.edu
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231028/8b3b09ee/attachment.html>

From onur.notonur at proton.me  Mon Oct 30 04:37:21 2023
From: onur.notonur at proton.me (onur.notonur)
Date: Mon, 30 Oct 2023 09:37:21 +0000
Subject: [petsc-users] Advices on creating DMPlex from custom input format
Message-ID: <dPJksC-4NjdAQCXV9V2LqMCtOaXalKjgUwnSnh8ffQ4cRhdx3_GLksNoGwY7lGFdwu8CIvURM--eBAMgqB_Ld65w_uIN1mr-Kb3SbB-CyQU=@proton.me>

Hi,

I hope this message finds you all in good health and high spirits.

I wanted to discuss an approach problem input file reading/processing in a solver which is using PETSc DMPlex. In our team we have a range of solvers, they are not built on PETSc except this one, but they all share a common problem input format. This format includes essential data such as node coordinates, element connectivity, boundary conditions based on elements, and specific metadata related to the problem. I create an array for boundary points on each rank and utilize them in our computations, I am doing it hardcoded currently but I need to start reading those input files, But I am not sure about the approach.

Here's what I have in mind:

- - Begin by reading the node coordinates and connectivity on a single core.
- Utilize the DMPlexCreateFromCellListPetsc() function to construct the DMPlex.
- Distribute the mesh across processors.
- Proceed to read and process the boundary conditions on each processor. If the global index of the boundary element corresponds to that processor, we process it; otherwise, we pass.

Additionally, maybe I need to reorder the mesh. In that case I think I can use the point permutation IS obtained from the DMPlexGetOrdering() function while processing boundary conditions.

Also I have another approach in my mind but I don't know if it's possible: Read/construct DMPlex on single core including boundary conditions. Store BC related data in Vec or another appropriate data structure. Then distribute this BC holding data structure too as well as DMPlex.

I would greatly appreciate your thoughts and any suggestions you might have regarding this approach. Looking forward to hearing your insights.

Best regards,

Onur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231030/a7fa0c39/attachment.html>

From jed at jedbrown.org  Mon Oct 30 11:34:51 2023
From: jed at jedbrown.org (Jed Brown)
Date: Mon, 30 Oct 2023 10:34:51 -0600
Subject: [petsc-users] Advices on creating DMPlex from custom input
 format
In-Reply-To: <dPJksC-4NjdAQCXV9V2LqMCtOaXalKjgUwnSnh8ffQ4cRhdx3_GLksNoGwY7lGFdwu8CIvURM--eBAMgqB_Ld65w_uIN1mr-Kb3SbB-CyQU=@proton.me>
References: <dPJksC-4NjdAQCXV9V2LqMCtOaXalKjgUwnSnh8ffQ4cRhdx3_GLksNoGwY7lGFdwu8CIvURM--eBAMgqB_Ld65w_uIN1mr-Kb3SbB-CyQU=@proton.me>
Message-ID: <874ji8jcgk.fsf@jedbrown.org>

It's probably easier to apply boundary conditions when you have the serial mesh. You may consider contributing the reader if it's a format that others use.

"onur.notonur via petsc-users" <petsc-users at mcs.anl.gov> writes:

> Hi,
>
> I hope this message finds you all in good health and high spirits.
>
> I wanted to discuss an approach problem input file reading/processing in a solver which is using PETSc DMPlex. In our team we have a range of solvers, they are not built on PETSc except this one, but they all share a common problem input format. This format includes essential data such as node coordinates, element connectivity, boundary conditions based on elements, and specific metadata related to the problem. I create an array for boundary points on each rank and utilize them in our computations, I am doing it hardcoded currently but I need to start reading those input files, But I am not sure about the approach.
>
> Here's what I have in mind:
>
> - - Begin by reading the node coordinates and connectivity on a single core.
> - Utilize the DMPlexCreateFromCellListPetsc() function to construct the DMPlex.
> - Distribute the mesh across processors.
> - Proceed to read and process the boundary conditions on each processor. If the global index of the boundary element corresponds to that processor, we process it; otherwise, we pass.
>
> Additionally, maybe I need to reorder the mesh. In that case I think I can use the point permutation IS obtained from the DMPlexGetOrdering() function while processing boundary conditions.
>
> Also I have another approach in my mind but I don't know if it's possible: Read/construct DMPlex on single core including boundary conditions. Store BC related data in Vec or another appropriate data structure. Then distribute this BC holding data structure too as well as DMPlex.
>
> I would greatly appreciate your thoughts and any suggestions you might have regarding this approach. Looking forward to hearing your insights.
>
> Best regards,
>
> Onur

From damynchipman at u.boisestate.edu  Mon Oct 30 11:42:42 2023
From: damynchipman at u.boisestate.edu (Damyn Chipman)
Date: Mon, 30 Oct 2023 10:42:42 -0600
Subject: [petsc-users] Copying PETSc Objects Across MPI Communicators
In-Reply-To: <CAMYG4GmvVWp46Z8LoqqqUBt8uXd0yNyCXYLpmLHdhZRH4qY+Vw@mail.gmail.com>
References: <98B5B966-38A0-46F6-86EB-09353554F0DB@u.boisestate.edu>
	<87h6mfmka5.fsf@jedbrown.org>
	<55B8159A-96F1-49BE-ADE5-F6D40D036115@u.boisestate.edu>
	<CAMYG4GkB9OFZzbAyvNr+xD+HQVsegPrKK2BjjnqNoQ6vYLsj_A@mail.gmail.com>
	<53D5A2A5-6958-4EC9-ABA5-CBBE1FB5D65C@u.boisestate.edu>
	<CAMYG4GmvVWp46Z8LoqqqUBt8uXd0yNyCXYLpmLHdhZRH4qY+Vw@mail.gmail.com>
Message-ID: <F5797BC1-C161-4A07-813A-BEEB6AC3094D@u.boisestate.edu>

Sounds good, thanks.

I?ve also been looking into Elemental, but the documentation seems outdated and I can?t find good examples on how to use it. I have the LLNL fork installed.

Thanks,
-Damyn

> On Oct 28, 2023, at 8:56?AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Fri, Oct 27, 2023 at 3:54?PM Damyn Chipman <damynchipman at u.boisestate.edu <mailto:damynchipman at u.boisestate.edu>> wrote:
>> Yeah, I?ll make an issue and use a modified version of this test routine.
>> 
>> Does anything change if I will be using MATSCALAPACK matrices instead of the built in MATDENSE?
> 
> No, that is likely worse.
>  
>> Like I said, I will be computing Schur complements and need to use a parallel and dense matrix format.
> 
> I do not understand the communication pattern, but it is possible that Elemental would be slightly faster since it has some cool built-in communication operations, however it might be more programming.
> 
>   Thanks,
> 
>      Matt
>  
>> -Damyn
>> 
>>> On Oct 26, 2023, at 10:01?AM, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>> 
>>> On Wed, Oct 25, 2023 at 11:55?PM Damyn Chipman <damynchipman at u.boisestate.edu <mailto:damynchipman at u.boisestate.edu>> wrote:
>>>> Great thanks, that seemed to work well. This is something my algorithm will do fairly often (?elevating? a node?s communicator to a communicator that includes siblings). The matrices formed are dense but low rank. With MatCreateSubMatrix, it appears I do a lot of copying from one Mat to another. Is there a way to do it with array copying or pointer movement instead of copying entries?
>>> 
>>> We could make a fast path for dense that avoids MatSetValues(). Can you make an issue for this? The number one thing that would make this faster is to contribute a small test. Then we could run it continually when putting in the fast path to make sure we are preserving correctness.
>>> 
>>>   Thanks,
>>> 
>>>     Matt
>>>  
>>>> -Damyn
>>>> 
>>>>> On Oct 24, 2023, at 9:51?PM, Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>> wrote:
>>>>> 
>>>>> You can place it in a parallel Mat (that has rows or columns on only one rank or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on a different rank or subset of ranks.
>>>>> 
>>>>> That said, you usually have a function that assembles the matrix and you can just call that on the new communicator.
>>>>> 
>>>>> Damyn Chipman <damynchipman at u.boisestate.edu <mailto:damynchipman at u.boisestate.edu>> writes:
>>>>> 
>>>>>> Hi PETSc developers,
>>>>>> 
>>>>>> In short, my question is this: Does PETSc provide a way to move or copy an object (say a Mat) from one communicator to another?
>>>>>> 
>>>>>> The more detailed scenario is this: I?m working on a linear algebra solver on quadtree meshes (i.e., p4est). I use communicator subsets in order to facilitate communication between siblings or nearby neighbors. When performing linear algebra across siblings (a group of 4), I need to copy a node?s data (i.e., a Mat object) from a sibling?s communicator to the communicator that includes the four siblings. From what I can tell, I can only copy a PETSc object onto the same communicator.
>>>>>> 
>>>>>> My current approach will be to copy the raw data from the Mat on one communicator to a new Mat on the new communicator, but I wanted to see if there is a more ?elegant? approach within PETSc.
>>>>>> 
>>>>>> Thanks in advance,
>>>>>> 
>>>>>> Damyn Chipman
>>>>>> Boise State University
>>>>>> PhD Candidate
>>>>>> Computational Sciences and Engineering
>>>>>> damynchipman at u.boisestate.edu <mailto:damynchipman at u.boisestate.edu>
>>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231030/516c73a4/attachment-0001.html>

From knepley at gmail.com  Mon Oct 30 12:16:56 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 30 Oct 2023 13:16:56 -0400
Subject: [petsc-users] Advices on creating DMPlex from custom input
 format
In-Reply-To: <dPJksC-4NjdAQCXV9V2LqMCtOaXalKjgUwnSnh8ffQ4cRhdx3_GLksNoGwY7lGFdwu8CIvURM--eBAMgqB_Ld65w_uIN1mr-Kb3SbB-CyQU=@proton.me>
References: <dPJksC-4NjdAQCXV9V2LqMCtOaXalKjgUwnSnh8ffQ4cRhdx3_GLksNoGwY7lGFdwu8CIvURM--eBAMgqB_Ld65w_uIN1mr-Kb3SbB-CyQU=@proton.me>
Message-ID: <CAMYG4GmpmG0WUDGbjY0Q-rDoABNyv-dwa3hmTnH18F2ETk4ysg@mail.gmail.com>

On Mon, Oct 30, 2023 at 5:37?AM onur.notonur via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi,
>
> I hope this message finds you all in good health and high spirits.
>
> I wanted to discuss an approach problem input file reading/processing in a
> solver which is using PETSc DMPlex. In our team we have a range of solvers,
> they are not built on PETSc except this one, but they all share a common
> problem input format. This format includes essential data such as node
> coordinates, element connectivity, boundary conditions based on elements,
> and specific metadata related to the problem. I create an array for
> boundary points on each rank and utilize them in our computations, I am
> doing it hardcoded currently but I need to start reading those input files,
> But I am not sure about the approach.
>
> Here's what I have in mind:
>
>    1. - Begin by reading the node coordinates and connectivity on a
>    single core.
>    - Utilize the DMPlexCreateFromCellListPetsc() function to construct
>    the DMPlex.
>    - Distribute the mesh across processors.
>    - Proceed to read and process the boundary conditions on each
>    processor. If the global index of the boundary element corresponds to that
>    processor, we process it; otherwise, we pass.
>
> Additionally, maybe I need to reorder the mesh. In that case I think I can
> use the point permutation IS obtained from the DMPlexGetOrdering() function
> while processing boundary conditions.
>
> Also I have another approach in my mind but I don't know if it's possible:
> Read/construct DMPlex on single core including boundary conditions. Store
> BC related data in Vec or another appropriate data structure. Then
> distribute this BC holding data structure too as well as DMPlex.
>
This is by far the easier approach. If you do not have meshes that are too
big to load in serial, I would do
this. Here is what you do:

  - Read in the mesh onto 1 process
  - Mark the boundary conditions, probably with a DMLabel
  - Make a Section over the mesh indicating what data you have for BC
  - Create a Vec from this Section and fill it with boundary values
(DMCreateGlobalVector)
  - Distribute the mesh, and keep the point SF (DMPlexDIstribute)
  - Create a BC SF from the points SF (PetscSFCreateSectionSF)
  - DIstribute the BC values using the BC SF (PetscSFBcast)

  Thanks,

    Matt

> I would greatly appreciate your thoughts and any suggestions you might
> have regarding this approach. Looking forward to hearing your insights.
>
> Best regards,
>
> Onur
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231030/4b57bba6/attachment.html>

From onur.notonur at proton.me  Tue Oct 31 04:59:27 2023
From: onur.notonur at proton.me (onur.notonur)
Date: Tue, 31 Oct 2023 09:59:27 +0000
Subject: [petsc-users] Advices on creating DMPlex from custom input
 format
In-Reply-To: <CAMYG4GmpmG0WUDGbjY0Q-rDoABNyv-dwa3hmTnH18F2ETk4ysg@mail.gmail.com>
References: <dPJksC-4NjdAQCXV9V2LqMCtOaXalKjgUwnSnh8ffQ4cRhdx3_GLksNoGwY7lGFdwu8CIvURM--eBAMgqB_Ld65w_uIN1mr-Kb3SbB-CyQU=@proton.me>
	<CAMYG4GmpmG0WUDGbjY0Q-rDoABNyv-dwa3hmTnH18F2ETk4ysg@mail.gmail.com>
Message-ID: <a7iapF2-oJOVhrcQ3WA6WmcIPz_ErlHXWJT1m7qBvv_rSuCZy-2p5a3OCwCUlE-j9cxgZp1pe04H6U724tAe2UdZ_MW08bLvb_D2jNLCGE4=@proton.me>

Dear Matt and Jed,

Thank you so much for your insights.

Jed, as far as I know, the format is custom internal structure. I will double-check this. If it is used outside, I'm more than willing to contribute the reader.

Best,

Onur

Sent with [Proton Mail](https://proton.me/) secure email.

------- Original Message -------
On Monday, October 30th, 2023 at 8:16 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Oct 30, 2023 at 5:37?AM onur.notonur via petsc-users <petsc-users at mcs.anl.gov> wrote:
>
>> Hi,
>>
>> I hope this message finds you all in good health and high spirits.
>>
>> I wanted to discuss an approach problem input file reading/processing in a solver which is using PETSc DMPlex. In our team we have a range of solvers, they are not built on PETSc except this one, but they all share a common problem input format. This format includes essential data such as node coordinates, element connectivity, boundary conditions based on elements, and specific metadata related to the problem. I create an array for boundary points on each rank and utilize them in our computations, I am doing it hardcoded currently but I need to start reading those input files, But I am not sure about the approach.
>>
>> Here's what I have in mind:
>>
>> - - Begin by reading the node coordinates and connectivity on a single core.
>> - Utilize the DMPlexCreateFromCellListPetsc() function to construct the DMPlex.
>> - Distribute the mesh across processors.
>> - Proceed to read and process the boundary conditions on each processor. If the global index of the boundary element corresponds to that processor, we process it; otherwise, we pass.
>>
>> Additionally, maybe I need to reorder the mesh. In that case I think I can use the point permutation IS obtained from the DMPlexGetOrdering() function while processing boundary conditions.
>>
>> Also I have another approach in my mind but I don't know if it's possible: Read/construct DMPlex on single core including boundary conditions. Store BC related data in Vec or another appropriate data structure. Then distribute this BC holding data structure too as well as DMPlex.
>
> This is by far the easier approach. If you do not have meshes that are too big to load in serial, I would do
> this. Here is what you do:
>
> - Read in the mesh onto 1 process
> - Mark the boundary conditions, probably with a DMLabel
> - Make a Section over the mesh indicating what data you have for BC
> - Create a Vec from this Section and fill it with boundary values (DMCreateGlobalVector)
> - Distribute the mesh, and keep the point SF (DMPlexDIstribute)
> - Create a BC SF from the points SF (PetscSFCreateSectionSF)
> - DIstribute the BC values using the BC SF (PetscSFBcast)
>
> Thanks,
>
> Matt
>
>> I would greatly appreciate your thoughts and any suggestions you might have regarding this approach. Looking forward to hearing your insights.
>>
>> Best regards,
>>
>> Onur
>
> --
>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> [https://www.cse.buffalo.edu/~knepley/](http://www.cse.buffalo.edu/~knepley/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231031/e9852cbe/attachment-0001.html>

From knepley at gmail.com  Tue Oct 31 06:00:00 2023
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 31 Oct 2023 07:00:00 -0400
Subject: [petsc-users] Boundary integral problem
In-Reply-To: <1621606128.5024918.1695630786287@mail.yahoo.com>
References: <1621606128.5024918.1695630786287.ref@mail.yahoo.com>
	<1621606128.5024918.1695630786287@mail.yahoo.com>
Message-ID: <CAMYG4GnNh6iMt4pBnN3KfM-9uoyTed9=vfL43imXyjTZsuS5Bg@mail.gmail.com>

On Mon, Sep 25, 2023 at 8:58?AM Azeddine Messikh via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Dear developers
>
> I tried to run ex24.c
> https://petsc.org/release/src/snes/tutorials/ex24.c.html using the
> following command line
>
> ./ex24 -sol_type quadratic -dm_plex_simplex 0  -field_petscspace_degree 1
> -potential_petscspace_degree 1 -dm_plex_box_faces 2,1
>
> I discovered that at
>
> 254:     PetscCall <https://petsc.org/release/manualpages/Sys/PetscCall/>(PetscWeakFormSetIndexBdResidual(wf, label, 1, 0, 0, 0, f0_bd_quadratic_q, 0, NULL));
>
>  reverses the value of the integrals at the top only. That is
> The boundary integral corresponding to node 5 becomes that of 4 and
> vise-versa.
> Same thing for nodes 5 and 6.
>

I apologize for taking so long to reply. This email fell out of my Inbox.

I believe the problem is understanding the ordering of unknowns in Plex.
For all shapes, I orient the boundary to have outward normals. This means
that for quads, the vertex ordering would be

  1--2--5--4    and    2--3--6--5

Does this make sense?

  Thanks,

     Matt


> The mesh index is as follows
> *4---*5---*6
>  |       |      |
>  |       |      |
> *1--- *2---*3
>
> However,  if I use  -dm_plex_simplex 1 there is no problem.
>
> The model is in the form Au = b
>
> the value of b with  "-dm_plex_simplex 0" is
> [0.25
> 0.0104167
> 0.
> 0.
> 0.145833
> 0.
> -0.583333
> 0.177083
> 0.
> 0.0833333
> -0.28125
> 0.
> 0.
> -0.6875
> 0.
> -0.75
> -0.364583
> 0.]
>
> and for -dm_plex_simplex 1
> [0.0833333
> 0.0104167
> 0.
> 0.
> 0.145833
> 0.
> -0.583333
> 0.177083
> 0.
> 0.25
> -0.260417
> 0.
> 1.43404e-16
> -0.645833
> 0.
> -0.75
> -0.427083
> 0.]
>
> you can see that the value at node 1 =0.25  and node 4 = 0.0833333 (
> simplex 0)
> which is reversed, that is,      node 4 =0.25  and node 1 = 0.0833333
> (simplex 1)
>
> So, my own calculation shows that at node 1 should be 0.083333 not 0.25.
> The -dm_plex_simplex 1 gives the correct answer but -dm_plex_simplex 0
> gives wrong answer.
>
>
> Would you please help me in this matter.
> Sincerely yours
> Azeddine M
>
>
>
>
>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231031/6ff93085/attachment.html>

From mfadams at lbl.gov  Tue Oct 31 08:29:59 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 31 Oct 2023 09:29:59 -0400
Subject: [petsc-users] Kokkos PtAP error
Message-ID: <CADOhEh6HMPfq7dE3AZwcg8n8UK-6iXwW2Sm-19Fcz9RHmvhE7A@mail.gmail.com>

I am getting this error.
This is in GAMG/HEM setup. PtAP for the coarse grid construction works, but
I call this in a graph routine
(/global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043).

Also, this PtAP does not need to be on the GPU anyway because P is
extremely sparse ... can I pin, say P, to the CPU to keep this all on the
host?

Thanks,
Mark


[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Petsc has generated inconsistent data
[0]PETSC ERROR: Unspecified symbolic phase for product AB with A
mpiaijkokkos, B mpiaij. Call MatProductSetFromOptions() first
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the
program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR:   Option left: name:-ksp_converged_reason (no value)
source: command line
[0]PETSC ERROR:   Option left: name:-ksp_viewxx (no value) source: command
line
[0]PETSC ERROR:   Option left: name:-log_view_gpu_timexxx (no value)
source: command line
[0]PETSC ERROR:   Option left: name:-options_left (no value) source:
command line
[0]PETSC ERROR:   Option left: name:-pc_gamg_use_aggressive_square_graph
value: true source: command line
[0]PETSC ERROR:   Option left: name:-pc_gamg_use_minimum_degree_ordering
value: false source: command line
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.20.0-168-ga7898f52c39
 GIT Date: 2023-10-28 10:07:38 -0500
[0]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tests/./ex13 on a
arch-perlmutter-dbg-gcc-kokkos-cuda named nid001680 by madams Tue Oct 31
06:21:25 2023
[0]PETSC ERROR: Configure options --CFLAGS="   -g" --CXXFLAGS=" -g"
--CUDAFLAGS="-g -Xcompiler -rdynamic" --with-cc=cc --with-cxx=CC
--with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 --FFLAGS="   -g " --COPTFLAGS="
-O0" --CXXOPTFLAGS=" -O0" --FOPTFLAGS="   -O0" --download-triangle=1
--with-debugging=1 --with-cuda=1 --with-cuda-arch=80 --with-mpiexec="srun
-G4" --with-batch=0 --download-kokkos --download-kokkos-kernels
--with-kokkos-kernels-tpl=0 --with-make-np=8
PETSC_ARCH=arch-perlmutter-dbg-gcc-kokkos-cuda
[0]PETSC ERROR: #1 MatProductSymbolic() at
/global/u2/m/madams/petsc/src/mat/interface/matproduct.c:807
[0]PETSC ERROR: #2 MatProductSymbolic_PtAP_Unsafe() at
/global/u2/m/madams/petsc/src/mat/interface/matproduct.c:73
[0]PETSC ERROR: #3 MatProductSymbolic_Unsafe() at
/global/u2/m/madams/petsc/src/mat/interface/matproduct.c:185
[0]PETSC ERROR: #4 MatProductSymbolic() at
/global/u2/m/madams/petsc/src/mat/interface/matproduct.c:795
[0]PETSC ERROR: #5 MatPtAP() at
/global/u2/m/madams/petsc/src/mat/interface/matrix.c:9938
[0]PETSC ERROR: #6 MatCoarsenApply_HEM_private() at
/global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231031/d0ef58ba/attachment.html>

From mfadams at lbl.gov  Tue Oct 31 13:24:52 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 31 Oct 2023 14:24:52 -0400
Subject: [petsc-users] Kokkos PtAP error
In-Reply-To: <CADOhEh6HMPfq7dE3AZwcg8n8UK-6iXwW2Sm-19Fcz9RHmvhE7A@mail.gmail.com>
References: <CADOhEh6HMPfq7dE3AZwcg8n8UK-6iXwW2Sm-19Fcz9RHmvhE7A@mail.gmail.com>
Message-ID: <CADOhEh54Yw7zf8=ArO21vUOwm4tSEgaKzMXTYygryArL8iaeeA@mail.gmail.com>

Correction, I get the same message with -mat_type aijcusparse.

Thanks,
Mark

On Tue, Oct 31, 2023 at 9:29?AM Mark Adams <mfadams at lbl.gov> wrote:

> I am getting this error.
> This is in GAMG/HEM setup. PtAP for the coarse grid construction works,
> but I call this in a graph routine
> (/global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043).
>
> Also, this PtAP does not need to be on the GPU anyway because P is
> extremely sparse ... can I pin, say P, to the CPU to keep this all on the
> host?
>
> Thanks,
> Mark
>
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Petsc has generated inconsistent data
> [0]PETSC ERROR: Unspecified symbolic phase for product AB with A
> mpiaijkokkos, B mpiaij. Call MatProductSetFromOptions() first
> [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the
> program crashed before usage or a spelling mistake, etc!
> [0]PETSC ERROR:   Option left: name:-ksp_converged_reason (no value)
> source: command line
> [0]PETSC ERROR:   Option left: name:-ksp_viewxx (no value) source: command
> line
> [0]PETSC ERROR:   Option left: name:-log_view_gpu_timexxx (no value)
> source: command line
> [0]PETSC ERROR:   Option left: name:-options_left (no value) source:
> command line
> [0]PETSC ERROR:   Option left: name:-pc_gamg_use_aggressive_square_graph
> value: true source: command line
> [0]PETSC ERROR:   Option left: name:-pc_gamg_use_minimum_degree_ordering
> value: false source: command line
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.20.0-168-ga7898f52c39
>  GIT Date: 2023-10-28 10:07:38 -0500
> [0]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tests/./ex13 on a
> arch-perlmutter-dbg-gcc-kokkos-cuda named nid001680 by madams Tue Oct 31
> 06:21:25 2023
> [0]PETSC ERROR: Configure options --CFLAGS="   -g" --CXXFLAGS=" -g"
> --CUDAFLAGS="-g -Xcompiler -rdynamic" --with-cc=cc --with-cxx=CC
> --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 --FFLAGS="   -g " --COPTFLAGS="
> -O0" --CXXOPTFLAGS=" -O0" --FOPTFLAGS="   -O0" --download-triangle=1
> --with-debugging=1 --with-cuda=1 --with-cuda-arch=80 --with-mpiexec="srun
> -G4" --with-batch=0 --download-kokkos --download-kokkos-kernels
> --with-kokkos-kernels-tpl=0 --with-make-np=8
> PETSC_ARCH=arch-perlmutter-dbg-gcc-kokkos-cuda
> [0]PETSC ERROR: #1 MatProductSymbolic() at
> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:807
> [0]PETSC ERROR: #2 MatProductSymbolic_PtAP_Unsafe() at
> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:73
> [0]PETSC ERROR: #3 MatProductSymbolic_Unsafe() at
> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:185
> [0]PETSC ERROR: #4 MatProductSymbolic() at
> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:795
> [0]PETSC ERROR: #5 MatPtAP() at
> /global/u2/m/madams/petsc/src/mat/interface/matrix.c:9938
> [0]PETSC ERROR: #6 MatCoarsenApply_HEM_private() at
> /global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231031/f59ef1dc/attachment.html>

From mfadams at lbl.gov  Tue Oct 31 14:03:57 2023
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 31 Oct 2023 15:03:57 -0400
Subject: [petsc-users] Kokkos PtAP error
In-Reply-To: <CADOhEh54Yw7zf8=ArO21vUOwm4tSEgaKzMXTYygryArL8iaeeA@mail.gmail.com>
References: <CADOhEh6HMPfq7dE3AZwcg8n8UK-6iXwW2Sm-19Fcz9RHmvhE7A@mail.gmail.com>
	<CADOhEh54Yw7zf8=ArO21vUOwm4tSEgaKzMXTYygryArL8iaeeA@mail.gmail.com>
Message-ID: <CADOhEh62Cg67QrNmLAj4rf0ircfTgQ=sP2=gk=1nPoz_gT2zjA@mail.gmail.com>

In reading the error message I see that I did not clone A, to get P, so P
was the wrong type with a device.

Thanks,
Mark

On Tue, Oct 31, 2023 at 2:24?PM Mark Adams <mfadams at lbl.gov> wrote:

> Correction, I get the same message with -mat_type aijcusparse.
>
> Thanks,
> Mark
>
> On Tue, Oct 31, 2023 at 9:29?AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> I am getting this error.
>> This is in GAMG/HEM setup. PtAP for the coarse grid construction works,
>> but I call this in a graph routine
>> (/global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043).
>>
>> Also, this PtAP does not need to be on the GPU anyway because P is
>> extremely sparse ... can I pin, say P, to the CPU to keep this all on the
>> host?
>>
>> Thanks,
>> Mark
>>
>>
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Petsc has generated inconsistent data
>> [0]PETSC ERROR: Unspecified symbolic phase for product AB with A
>> mpiaijkokkos, B mpiaij. Call MatProductSetFromOptions() first
>> [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the
>> program crashed before usage or a spelling mistake, etc!
>> [0]PETSC ERROR:   Option left: name:-ksp_converged_reason (no value)
>> source: command line
>> [0]PETSC ERROR:   Option left: name:-ksp_viewxx (no value) source:
>> command line
>> [0]PETSC ERROR:   Option left: name:-log_view_gpu_timexxx (no value)
>> source: command line
>> [0]PETSC ERROR:   Option left: name:-options_left (no value) source:
>> command line
>> [0]PETSC ERROR:   Option left: name:-pc_gamg_use_aggressive_square_graph
>> value: true source: command line
>> [0]PETSC ERROR:   Option left: name:-pc_gamg_use_minimum_degree_ordering
>> value: false source: command line
>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>> [0]PETSC ERROR: Petsc Development GIT revision: v3.20.0-168-ga7898f52c39
>>  GIT Date: 2023-10-28 10:07:38 -0500
>> [0]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tests/./ex13 on a
>> arch-perlmutter-dbg-gcc-kokkos-cuda named nid001680 by madams Tue Oct 31
>> 06:21:25 2023
>> [0]PETSC ERROR: Configure options --CFLAGS="   -g" --CXXFLAGS=" -g"
>> --CUDAFLAGS="-g -Xcompiler -rdynamic" --with-cc=cc --with-cxx=CC
>> --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 --FFLAGS="   -g " --COPTFLAGS="
>> -O0" --CXXOPTFLAGS=" -O0" --FOPTFLAGS="   -O0" --download-triangle=1
>> --with-debugging=1 --with-cuda=1 --with-cuda-arch=80 --with-mpiexec="srun
>> -G4" --with-batch=0 --download-kokkos --download-kokkos-kernels
>> --with-kokkos-kernels-tpl=0 --with-make-np=8
>> PETSC_ARCH=arch-perlmutter-dbg-gcc-kokkos-cuda
>> [0]PETSC ERROR: #1 MatProductSymbolic() at
>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:807
>> [0]PETSC ERROR: #2 MatProductSymbolic_PtAP_Unsafe() at
>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:73
>> [0]PETSC ERROR: #3 MatProductSymbolic_Unsafe() at
>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:185
>> [0]PETSC ERROR: #4 MatProductSymbolic() at
>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:795
>> [0]PETSC ERROR: #5 MatPtAP() at
>> /global/u2/m/madams/petsc/src/mat/interface/matrix.c:9938
>> [0]PETSC ERROR: #6 MatCoarsenApply_HEM_private() at
>> /global/u2/m/madams/petsc/src/mat/coarsen/impls/hem/hem.c:1043
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231031/7b6f1b48/attachment.html>

From victoria.rolandi93 at gmail.com  Tue Oct 31 22:30:56 2023
From: victoria.rolandi93 at gmail.com (Victoria Rolandi)
Date: Tue, 31 Oct 2023 20:30:56 -0700
Subject: [petsc-users] Error using Metis with PETSc installed with MUMPS
Message-ID: <CALuT7OkCuyThwgD5C4vLXFw7V5e1Z_MTPhQuR9s9QVy6t4pENg@mail.gmail.com>

Hi,

I'm solving a large sparse linear system in parallel and I am using PETSc
with MUMPS. I am trying to test different options, like the ordering of the
matrix. Everything works if I use the *-mat_mumps_icntl_7 2  *or
*-mat_mumps_icntl_7
0 *options (with the first one, AMF, performing better than AMD), however
when I test METIS *-mat_mumps_icntl_7** 5 *I get an error (reported at the
end of the email).

I have configured PETSc with the following options:

--with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
 --with-scalar-type=complex --with-debugging=0 --with-precision=single
--download-mumps --download-scalapack --download-parmetis --download-metis

and the installation didn't give any problems.

Could you help me understand why metis is not working?

Thank you in advance,
Victoria

Error:

 ****** ANALYSIS STEP ********
 ** Maximum transversal (ICNTL(6)) not allowed because matrix is distributed
 Processing a graph of size:    699150 with      69238690 edges
 Ordering based on METIS
510522 37081376 [100] [10486 699150]
Error! Unknown CType: -1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231031/12d2bf64/attachment-0001.html>