[petsc-users] Implementing checkpoints in matrix construction

Jed Brown jed at jedbrown.org
Fri Aug 17 15:00:43 CDT 2018


Can you split the computation into several parts such that the matrix you desire is a sum?

  A = A_1 + A_2 + A_3 + ...

Then you can create and write each of the matrices A_i separately
(perhaps in parallel jobs) and sum them later.

"Pham, Dung Ngoc" <dnpham at wpi.edu> writes:

> Dear Petsc developers and Users,
>
> I am constructing a very large matrices (~5,000,000*5,000,000) for a generalized eigenvalue problem in MPIAIJ format across multiple nodes. The program is to be run on a shared HPC cluster using Slurm workload manager. Due to multiple loops and calculations needed, the matrix construction time is long (may span for more than a week).
>
> Hence, I am trying to see if I can implement checkpoints into the codes, so that the matrix can be constructed partially through multiple job submissions, each job picking up from where the previous one left until the matrix is fully built and we can write the global matrix into a binary file for further eigenvalue analysis. My questions are:
> Does Petsc MPIAIJ format is amenable to such check points?
> If so, are there any subroutines/functions that I can start with?
>
> I appreciate any comments/suggestions.
>
> Thank you,
> D. N. Pham


More information about the petsc-users mailing list