[petsc-users] solve problem with pastix

Smith, Barry F. bsmith at mcs.anl.gov
Wed Nov 6 09:52:20 CST 2019


  You can also just look at configure.log where it will show the calling sequence of how PETSc configured and built Pastix. The recipe is in config/BuildSystem/config/packages/PaStiX.py we don't monkey with low level things like the affinity of external packages. My guess is that your cluster system has inconsistent parts related to this, that one tool works and another does not indicates they are inconsistent with respect to each other in what they expect.

   Barry




> On Nov 6, 2019, at 4:02 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Wed, Nov 6, 2019 at 4:40 AM hg <hgbk2008 at gmail.com> wrote:
> Look into arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c I saw something like:
> 
> #ifdef HAVE_OLD_SCHED_SETAFFINITY
>     if(sched_setaffinity(0,&mask) < 0)
> #else /* HAVE_OLD_SCHED_SETAFFINITY */
>     if(sched_setaffinity(0,sizeof(mask),&mask) < 0)
> #endif /* HAVE_OLD_SCHED_SETAFFINITY */
>       {
>   perror("sched_setaffinity");
>   EXIT(MOD_SOPALIN, INTERNAL_ERR);
>       }
> 
> Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY during compilation?
> 
> May I know how to trigger re-compilation of external packages with petsc? I may go in there and check what's going on.
> 
> If we built it during configure, then you can just go to
> 
>   $PETSC_DIR/$PETSC_ARCH/externalpackages/*pastix*/
> 
> and rebuild/install it to test. If you want configure to do it, you have to delete
> 
>   $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/pkg.conf.pastix
> 
> and reconfigure.
> 
>   Thanks,
> 
>      Matt
>  
> Giang
> 
> 
> On Wed, Nov 6, 2019 at 10:12 AM hg <hgbk2008 at gmail.com> wrote:
> sched_setaffinity: Invalid argument only happens when I launch the job with sbatch. Running without scheduler is fine. I think this has something to do with pastix.
> 
> Giang
> 
> 
> On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>   Google finds this https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186
> 
> 
> 
> > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users <petsc-users at mcs.anl.gov> wrote:
> > 
> > I have no idea. That is a good question for the PasTix list.
> > 
> >   Thanks,
> > 
> >     Matt
> > 
> > On Tue, Nov 5, 2019 at 5:32 PM hg <hgbk2008 at gmail.com> wrote:
> > Should thread affinity be invoked? I set  -mat_pastix_threadnbr 1 and also OMP_NUM_THREADS to 1
> > 
> > Giang
> > 
> > 
> > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley <knepley at gmail.com> wrote:
> > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users <petsc-users at mcs.anl.gov> wrote:
> > Hello
> > 
> > I got crashed when using Pastix as solver for KSP. The error message looks like:
> > 
> > ....
> > NUMBER of BUBBLE 1
> > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0
> > ** End of Partition & Distribution phase **
> >    Time to analyze                              0.225 s
> >    Number of nonzeros in factorized matrix      708784076
> >    Fill-in                                      12.2337
> >    Number of operations (LU)                    2.80185e+12
> >    Prediction Time to factorize (AMD 6180  MKL) 394 s
> > 0 : SolverMatrix size (without coefficients)    32.4 MB
> > 0 : Number of nonzeros (local block structure)  365309391
> >  Numerical Factorization (LU) :
> > 0 : Internal CSC size                           1.08 GB
> >    Time to fill internal csc                    6.66 s
> >    --- Sopalin : Allocation de la structure globale ---
> >    --- Fin Sopalin Init                             ---
> >    --- Initialisation des tableaux globaux          ---
> > sched_setaffinity: Invalid argument
> > [node083:165071] *** Process received signal ***
> > [node083:165071] Signal: Aborted (6)
> > [node083:165071] Signal code:  (-6)
> > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680]
> > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207]
> > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8]
> > [node083:165071] [ 3] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d]
> > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 communication, 0 out-of-core)
> >    --- Sopalin : Local structure allocation         ---
> > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2]
> > [node083:165071] [ 5] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2]
> > [node083:165071] [ 6] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31]
> > [node083:165071] [ 7] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170]
> > [node083:165071] [ 8] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2]
> > [node083:165071] [ 9] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325]
> > [node083:165071] [10] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b]
> > [node083:165071] [11] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552]
> > [node083:165071] [12] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09]
> > [node083:165071] [13] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9]
> > [node083:165071] [14] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81]
> > [node083:165071] [15] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e]
> > 
> > Does anyone have an idea what is the problem and how to fix it? The PETSc parameters I used are as below:
> > 
> > It looks like PasTix is having trouble setting the thread affinity:
> > 
> > sched_setaffinity: Invalid argument
> > 
> > so it may be your build of PasTix.
> > 
> >   Thanks,
> > 
> >      Matt
> >  
> > -pc_type lu
> > -pc_factor_mat_solver_package pastix
> > -mat_pastix_verbose 2
> > -mat_pastix_threadnbr 1
> > 
> > Giang
> > 
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/



More information about the petsc-users mailing list