[petsc-users] How to efficiently fill in, in parallel, a PETSc matrix from a COO sparse matrix?

Wed Jun 21 13:16:01 CDT 2023

So far, I've tried this:

// fill_in_matrix.cc

static char help[] = "Fill in a parallel COO format sparse matrix.";

#include <petsc.h>
#include <vector>

int main(int argc, char **args)
{
    MPI_Comm comm;
    Mat A;
    PetscInt m = 5;
    PetscMPIInt rank, size;

    PetscCall(PetscInitialize(&argc, &args, NULL, help));

    comm = PETSC_COMM_WORLD;
    PetscCallMPI(MPI_Comm_rank(comm, &rank));
    PetscCallMPI(MPI_Comm_size(comm, &size));

    PetscCall(MatCreate(comm, &A));
    PetscCall(MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, m, m));
    PetscCall(MatSetFromOptions(A));
    PetscCall(MatSetUp(A));

    std::vector<PetscInt> coo_i{0, 0, 1, 2, 3, 4};
    std::vector<PetscInt> coo_j{0, 0, 1, 2, 3, 4};
    std::vector<PetscScalar> coo_v{2, -1, 2, 3, 4, 5};

    PetscCallMPI(MPI_Comm_rank(comm, &rank));

    if (rank == 0)
    {
        for (size_t j = 0; j < coo_i.size(); j++)
            PetscCall(MatSetValues(A,
                                   1, &coo_i.at(j),
                                   1, &coo_j.at(j),
                                   &coo_v.at(j),
                                   ADD_VALUES));

        PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
        PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
        PetscCall(MatView(A, PETSC_VIEWER_STDERR_WORLD));
    }

    PetscCall(MatDestroy(&A));
    PetscCall(PetscFinalize());

    return 0;
}

When running with 1 process, it runs like a charm, but when running with
more than one process, the code does not finish. Somehow it gets stuck
forever.

Em qua., 21 de jun. de 2023 às 14:12, Matthew Knepley <knepley at gmail.com>
escreveu:

> On Wed, Jun 21, 2023 at 12:57 PM Diego Magela Lemos <diegomagela at usp.br>
> wrote:
>
>> Unfortunately, I can modify ex2 to perform the matrix fill in using only
>> one rank, although I have understood how it works.
>> Is it so hard to do that?
>>
>
> It should not be. Maybe describe what is not clear? ex2 runs in parallel
> now.
>
>   Thanks,
>
>     Matt
>
>
>> Thank you.
>>
>>
>> Em qua., 21 de jun. de 2023 às 12:20, Mark Adams <mfadams at lbl.gov>
>> escreveu:
>>
>>> ex2 looks the same as the code at the beginning of the thread, which
>>> looks fine to me, yet fails.
>>> (the only thing I can think of is that &v.at(i) is not doing what one
>>> wants)
>>>
>>> Diego: I would start with this ex2.c, add your view statement, verify;
>>> incrementally change ex2 to your syntax and see where it breaks.
>>>
>>> Mark
>>>
>>> On Wed, Jun 21, 2023 at 9:50 AM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Wed, Jun 21, 2023 at 9:22 AM Diego Magela Lemos <diegomagela at usp.br>
>>>> wrote:
>>>>
>>>>> Please, could you provide a minimal working example (or link) of how
>>>>> to do this?
>>>>>
>>>>
>>>> You can see here
>>>>
>>>>   https://petsc.org/main/src/ksp/ksp/tutorials/ex2.c.html
>>>>
>>>> that each process only sets values for the rows it owns.
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>>
>>>>> Thank you.
>>>>>
>>>>> Em ter., 20 de jun. de 2023 às 15:08, Matthew Knepley <
>>>>> knepley at gmail.com> escreveu:
>>>>>
>>>>>> On Tue, Jun 20, 2023 at 2:02 PM Diego Magela Lemos <
>>>>>> diegomagela at usp.br> wrote:
>>>>>>
>>>>>>> So... what do I need to do, please?
>>>>>>> Why am I getting wrong results when solving the linear system if the
>>>>>>> matrix is filled in with MatSetPreallocationCOO and MatSetValuesCOO?
>>>>>>>
>>>>>>
>>>>>> It appears that you have _all_ processes submit _all_ triples (i, j,
>>>>>> v). Each triple can only be submitted by a single process. You can fix this
>>>>>> in many ways. For example, an easy but suboptimal way is just to have
>>>>>> process 0 submit them all, and all other processes submit nothing.
>>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>   Matt
>>>>>>
>>>>>>
>>>>>>> Em ter., 20 de jun. de 2023 às 14:56, Jed Brown <jed at jedbrown.org>
>>>>>>> escreveu:
>>>>>>>
>>>>>>>> Matthew Knepley <knepley at gmail.com> writes:
>>>>>>>>
>>>>>>>> >> The matrix entries are multiplied by 2, that is, the number of
>>>>>>>> processes
>>>>>>>> >> used to execute the code.
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> > No. This was mostly intended for GPUs, where there is 1 process.
>>>>>>>> If you
>>>>>>>> > want to use multiple MPI processes, then each process can only
>>>>>>>> introduce
>>>>>>>> > some disjoint subset of the values. This is also how
>>>>>>>> MatSetValues() works,
>>>>>>>> > but it might not be as obvious.
>>>>>>>>
>>>>>>>> They need not be disjoint, just sum to the expected values. This
>>>>>>>> interface is very convenient for FE and FV methods. MatSetValues with
>>>>>>>> ADD_VALUES has similar semantics without the intermediate storage, but it
>>>>>>>> forces you to submit one element matrix at a time. Classic parallelism
>>>>>>>> granularity versus memory use tradeoff with MatSetValuesCOO being a clear
>>>>>>>> win on GPUs and more nuanced for CPUs.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230621/7ea0baf8/attachment-0001.html>