[petsc-users] solve problem with pastix

Smith, Barry F. bsmith at mcs.anl.gov
Tue Nov 19 06:20:45 CST 2019


  Thanks for the fix.  https://gitlab.com/petsc/petsc/pipelines/96957999

> On Nov 14, 2019, at 2:04 PM, hg <hgbk2008 at gmail.com> wrote:
> 
> Hello
> 
> It turns out that hwloc is not installed on the cluster system that I'm using. Without hwloc, pastix will run into the branch using sched_setaffinity and caused error (see above at sopalin_thread.c). I'm not able to understand and find a solution with sched_setaffinity so I think enabling hwloc is an easier solution. Between, hwloc is recommended to compile Pastix according to those threads:
> 
> https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186
> https://solverstack.gitlabpages.inria.fr/pastix/Bindings.html
> 
> hwloc is supported in PETSc so I assumed a clean and easy solution to compile with --download-hwloc. I made some changes in config/BuildSystem/config/packages/PaStiX.py to tell pastix to link to hwloc:
> 
> ...
> self.hwloc          = framework.require('config.packages.hwloc',self)
> ...
> if self.hwloc.found:
>       g.write('CCPASTIX   := $(CCPASTIX) -DWITH_HWLOC '+self.headers.toString(self.hwloc.include)+'\n')
>       g.write('EXTRALIB   := $(EXTRALIB) '+self.libraries.toString(self.hwloc.dlib)+'\n')
> 
> But it does not compile:
> 
> Possible ERROR while running linker: exit code 1
> stderr:
> /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_init':
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:822: undefined reference to `hwloc_topology_init'
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:828: undefined reference to `hwloc_topology_load'
> /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_clean':
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:4677: undefined reference to `hwloc_topology_destroy'
> /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function `hwloc_get_obj_by_type':
> /opt/petsc-dev/include/hwloc/inlines.h:76: undefined reference to `hwloc_get_type_depth'
> /opt/petsc-dev/include/hwloc/inlines.h:81: undefined reference to `hwloc_get_obj_by_depth'
> /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function `sopalin_bindthread':
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:538: undefined reference to `hwloc_bitmap_dup'
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:539: undefined reference to `hwloc_bitmap_singlify'
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:543: undefined reference to `hwloc_set_cpubind'
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:567: undefined reference to `hwloc_bitmap_free'
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:548: undefined reference to `hwloc_bitmap_asprintf'
> 
> Any idea is appreciated. I can attach configure.log as needed.
> 
> Giang
> 
> 
> On Thu, Nov 7, 2019 at 12:18 AM hg <hgbk2008 at gmail.com> wrote:
> Hi Barry
> 
> Maybe you're right, sched_setaffinity returns EINVAL in my case, Probably the scheduler does not allow the process to bind to thread on its own.
> 
> Giang
> 
> 
> On Wed, Nov 6, 2019 at 4:52 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>   You can also just look at configure.log where it will show the calling sequence of how PETSc configured and built Pastix. The recipe is in config/BuildSystem/config/packages/PaStiX.py we don't monkey with low level things like the affinity of external packages. My guess is that your cluster system has inconsistent parts related to this, that one tool works and another does not indicates they are inconsistent with respect to each other in what they expect.
> 
>    Barry
> 
> 
> 
> 
> > On Nov 6, 2019, at 4:02 AM, Matthew Knepley <knepley at gmail.com> wrote:
> > 
> > On Wed, Nov 6, 2019 at 4:40 AM hg <hgbk2008 at gmail.com> wrote:
> > Look into arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c I saw something like:
> > 
> > #ifdef HAVE_OLD_SCHED_SETAFFINITY
> >     if(sched_setaffinity(0,&mask) < 0)
> > #else /* HAVE_OLD_SCHED_SETAFFINITY */
> >     if(sched_setaffinity(0,sizeof(mask),&mask) < 0)
> > #endif /* HAVE_OLD_SCHED_SETAFFINITY */
> >       {
> >   perror("sched_setaffinity");
> >   EXIT(MOD_SOPALIN, INTERNAL_ERR);
> >       }
> > 
> > Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY during compilation?
> > 
> > May I know how to trigger re-compilation of external packages with petsc? I may go in there and check what's going on.
> > 
> > If we built it during configure, then you can just go to
> > 
> >   $PETSC_DIR/$PETSC_ARCH/externalpackages/*pastix*/
> > 
> > and rebuild/install it to test. If you want configure to do it, you have to delete
> > 
> >   $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/pkg.conf.pastix
> > 
> > and reconfigure.
> > 
> >   Thanks,
> > 
> >      Matt
> >  
> > Giang
> > 
> > 
> > On Wed, Nov 6, 2019 at 10:12 AM hg <hgbk2008 at gmail.com> wrote:
> > sched_setaffinity: Invalid argument only happens when I launch the job with sbatch. Running without scheduler is fine. I think this has something to do with pastix.
> > 
> > Giang
> > 
> > 
> > On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> > 
> >   Google finds this https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186
> > 
> > 
> > 
> > > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users <petsc-users at mcs.anl.gov> wrote:
> > > 
> > > I have no idea. That is a good question for the PasTix list.
> > > 
> > >   Thanks,
> > > 
> > >     Matt
> > > 
> > > On Tue, Nov 5, 2019 at 5:32 PM hg <hgbk2008 at gmail.com> wrote:
> > > Should thread affinity be invoked? I set  -mat_pastix_threadnbr 1 and also OMP_NUM_THREADS to 1
> > > 
> > > Giang
> > > 
> > > 
> > > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley <knepley at gmail.com> wrote:
> > > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users <petsc-users at mcs.anl.gov> wrote:
> > > Hello
> > > 
> > > I got crashed when using Pastix as solver for KSP. The error message looks like:
> > > 
> > > ....
> > > NUMBER of BUBBLE 1
> > > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0
> > > ** End of Partition & Distribution phase **
> > >    Time to analyze                              0.225 s
> > >    Number of nonzeros in factorized matrix      708784076
> > >    Fill-in                                      12.2337
> > >    Number of operations (LU)                    2.80185e+12
> > >    Prediction Time to factorize (AMD 6180  MKL) 394 s
> > > 0 : SolverMatrix size (without coefficients)    32.4 MB
> > > 0 : Number of nonzeros (local block structure)  365309391
> > >  Numerical Factorization (LU) :
> > > 0 : Internal CSC size                           1.08 GB
> > >    Time to fill internal csc                    6.66 s
> > >    --- Sopalin : Allocation de la structure globale ---
> > >    --- Fin Sopalin Init                             ---
> > >    --- Initialisation des tableaux globaux          ---
> > > sched_setaffinity: Invalid argument
> > > [node083:165071] *** Process received signal ***
> > > [node083:165071] Signal: Aborted (6)
> > > [node083:165071] Signal code:  (-6)
> > > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680]
> > > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207]
> > > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8]
> > > [node083:165071] [ 3] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d]
> > > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 communication, 0 out-of-core)
> > >    --- Sopalin : Local structure allocation         ---
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2]
> > > [node083:165071] [ 5] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2]
> > > [node083:165071] [ 6] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31]
> > > [node083:165071] [ 7] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170]
> > > [node083:165071] [ 8] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2]
> > > [node083:165071] [ 9] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325]
> > > [node083:165071] [10] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b]
> > > [node083:165071] [11] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552]
> > > [node083:165071] [12] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09]
> > > [node083:165071] [13] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9]
> > > [node083:165071] [14] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81]
> > > [node083:165071] [15] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e]
> > > 
> > > Does anyone have an idea what is the problem and how to fix it? The PETSc parameters I used are as below:
> > > 
> > > It looks like PasTix is having trouble setting the thread affinity:
> > > 
> > > sched_setaffinity: Invalid argument
> > > 
> > > so it may be your build of PasTix.
> > > 
> > >   Thanks,
> > > 
> > >      Matt
> > >  
> > > -pc_type lu
> > > -pc_factor_mat_solver_package pastix
> > > -mat_pastix_verbose 2
> > > -mat_pastix_threadnbr 1
> > > 
> > > Giang
> > > 
> > > 
> > > 
> > > -- 
> > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > > -- Norbert Wiener
> > > 
> > > https://www.cse.buffalo.edu/~knepley/
> > > 
> > > 
> > > -- 
> > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > > -- Norbert Wiener
> > > 
> > > https://www.cse.buffalo.edu/~knepley/
> > 
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/
> 



More information about the petsc-users mailing list