<div dir="ltr"><div dir="ltr">FYI, this problem is fixed, providing that hwloc is added to dependencies of Pastix.</div><div dir="ltr"><br></div><div dir="ltr"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Giang</div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Nov 14, 2019 at 9:04 PM hg <<a href="mailto:hgbk2008@gmail.com">hgbk2008@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hello<div><br></div><div>It turns out that hwloc is not installed on the cluster system that I'm using. Without hwloc, pastix will run into the branch using sched_setaffinity and caused error (see above at sopalin_thread.c). I'm not able to understand and find a solution with sched_setaffinity so I think enabling hwloc is an easier solution. Between, hwloc is recommended to compile Pastix according to those threads:</div><div><br></div><div><a href="https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186" rel="noreferrer" target="_blank">https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186</a><br></div><div><a href="https://solverstack.gitlabpages.inria.fr/pastix/Bindings.html" target="_blank">https://solverstack.gitlabpages.inria.fr/pastix/Bindings.html</a><br></div><div><br></div><div>hwloc is supported in PETSc so I assumed a clean and easy solution to compile with --download-hwloc. I made some changes in config/BuildSystem/config/packages/<span>PaStiX</span>.py to tell pastix to link to hwloc:<br></div><div><br></div><div>...</div><div>self.hwloc = framework.require('config.packages.hwloc',self)<br></div><div>...</div><div>if self.hwloc.found:<br> g.write('CCPASTIX := $(CCPASTIX) -DWITH_HWLOC '+self.headers.toString(self.hwloc.include)+'\n')<br> g.write('EXTRALIB := $(EXTRALIB) '+self.libraries.toString(self.hwloc.dlib)+'\n')<br></div><div><br></div><div>But it does not compile:</div><div><br></div><div>Possible ERROR while running linker: exit code 1<br>stderr:<br>/opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_init':<br>/home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:822: undefined reference to `hwloc_topology_init'<br>/home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:828: undefined reference to `hwloc_topology_load'<br>/opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_clean':<br>/home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:4677: undefined reference to `hwloc_topology_destroy'<br>/opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function `hwloc_get_obj_by_type':<br>/opt/petsc-dev/include/hwloc/inlines.h:76: undefined reference to `hwloc_get_type_depth'<br>/opt/petsc-dev/include/hwloc/inlines.h:81: undefined reference to `hwloc_get_obj_by_depth'<br>/opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function `sopalin_bindthread':<br>/home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:538: undefined reference to `hwloc_bitmap_dup'<br>/home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:539: undefined reference to `hwloc_bitmap_singlify'<br>/home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:543: undefined reference to `hwloc_set_cpubind'<br>/home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:567: undefined reference to `hwloc_bitmap_free'<br>/home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:548: undefined reference to `hwloc_bitmap_asprintf'<br></div><div><br></div><div>Any idea is appreciated. I can attach configure.log as needed.</div><div><br clear="all"><div><div dir="ltr"><div dir="ltr">Giang</div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Nov 7, 2019 at 12:18 AM hg <<a href="mailto:hgbk2008@gmail.com" target="_blank">hgbk2008@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Hi Barry<div><br></div><div>Maybe you're right, sched_setaffinity returns EINVAL in my case, Probably the scheduler does not allow the process to bind to thread on its own.</div><div><br clear="all"><div><div dir="ltr"><div dir="ltr">Giang</div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 6, 2019 at 4:52 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
You can also just look at configure.log where it will show the calling sequence of how PETSc configured and built Pastix. The recipe is in config/BuildSystem/config/packages/PaStiX.py we don't monkey with low level things like the affinity of external packages. My guess is that your cluster system has inconsistent parts related to this, that one tool works and another does not indicates they are inconsistent with respect to each other in what they expect.<br>
<br>
Barry<br>
<br>
<br>
<br>
<br>
> On Nov 6, 2019, at 4:02 AM, Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br>
> <br>
> On Wed, Nov 6, 2019 at 4:40 AM hg <<a href="mailto:hgbk2008@gmail.com" target="_blank">hgbk2008@gmail.com</a>> wrote:<br>
> Look into arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c I saw something like:<br>
> <br>
> #ifdef HAVE_OLD_SCHED_SETAFFINITY<br>
> if(sched_setaffinity(0,&mask) < 0)<br>
> #else /* HAVE_OLD_SCHED_SETAFFINITY */<br>
> if(sched_setaffinity(0,sizeof(mask),&mask) < 0)<br>
> #endif /* HAVE_OLD_SCHED_SETAFFINITY */<br>
> {<br>
> perror("sched_setaffinity");<br>
> EXIT(MOD_SOPALIN, INTERNAL_ERR);<br>
> }<br>
> <br>
> Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY during compilation?<br>
> <br>
> May I know how to trigger re-compilation of external packages with petsc? I may go in there and check what's going on.<br>
> <br>
> If we built it during configure, then you can just go to<br>
> <br>
> $PETSC_DIR/$PETSC_ARCH/externalpackages/*pastix*/<br>
> <br>
> and rebuild/install it to test. If you want configure to do it, you have to delete<br>
> <br>
> $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/pkg.conf.pastix<br>
> <br>
> and reconfigure.<br>
> <br>
> Thanks,<br>
> <br>
> Matt<br>
> <br>
> Giang<br>
> <br>
> <br>
> On Wed, Nov 6, 2019 at 10:12 AM hg <<a href="mailto:hgbk2008@gmail.com" target="_blank">hgbk2008@gmail.com</a>> wrote:<br>
> sched_setaffinity: Invalid argument only happens when I launch the job with sbatch. Running without scheduler is fine. I think this has something to do with pastix.<br>
> <br>
> Giang<br>
> <br>
> <br>
> On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
> <br>
> Google finds this <a href="https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186" rel="noreferrer" target="_blank">https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186</a><br>
> <br>
> <br>
> <br>
> > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<br>
> > <br>
> > I have no idea. That is a good question for the PasTix list.<br>
> > <br>
> > Thanks,<br>
> > <br>
> > Matt<br>
> > <br>
> > On Tue, Nov 5, 2019 at 5:32 PM hg <<a href="mailto:hgbk2008@gmail.com" target="_blank">hgbk2008@gmail.com</a>> wrote:<br>
> > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and also OMP_NUM_THREADS to 1<br>
> > <br>
> > Giang<br>
> > <br>
> > <br>
> > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br>
> > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<br>
> > Hello<br>
> > <br>
> > I got crashed when using Pastix as solver for KSP. The error message looks like:<br>
> > <br>
> > ....<br>
> > NUMBER of BUBBLE 1<br>
> > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0<br>
> > ** End of Partition & Distribution phase **<br>
> > Time to analyze 0.225 s<br>
> > Number of nonzeros in factorized matrix 708784076<br>
> > Fill-in 12.2337<br>
> > Number of operations (LU) 2.80185e+12<br>
> > Prediction Time to factorize (AMD 6180 MKL) 394 s<br>
> > 0 : SolverMatrix size (without coefficients) 32.4 MB<br>
> > 0 : Number of nonzeros (local block structure) 365309391<br>
> > Numerical Factorization (LU) :<br>
> > 0 : Internal CSC size 1.08 GB<br>
> > Time to fill internal csc 6.66 s<br>
> > --- Sopalin : Allocation de la structure globale ---<br>
> > --- Fin Sopalin Init ---<br>
> > --- Initialisation des tableaux globaux ---<br>
> > sched_setaffinity: Invalid argument<br>
> > [node083:165071] *** Process received signal ***<br>
> > [node083:165071] Signal: Aborted (6)<br>
> > [node083:165071] Signal code: (-6)<br>
> > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680]<br>
> > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207]<br>
> > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8]<br>
> > [node083:165071] [ 3] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d]<br>
> > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 communication, 0 out-of-core)<br>
> > --- Sopalin : Local structure allocation ---<br>
> > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2]<br>
> > [node083:165071] [ 5] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2]<br>
> > [node083:165071] [ 6] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31]<br>
> > [node083:165071] [ 7] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170]<br>
> > [node083:165071] [ 8] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2]<br>
> > [node083:165071] [ 9] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325]<br>
> > [node083:165071] [10] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b]<br>
> > [node083:165071] [11] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552]<br>
> > [node083:165071] [12] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09]<br>
> > [node083:165071] [13] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9]<br>
> > [node083:165071] [14] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81]<br>
> > [node083:165071] [15] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e]<br>
> > <br>
> > Does anyone have an idea what is the problem and how to fix it? The PETSc parameters I used are as below:<br>
> > <br>
> > It looks like PasTix is having trouble setting the thread affinity:<br>
> > <br>
> > sched_setaffinity: Invalid argument<br>
> > <br>
> > so it may be your build of PasTix.<br>
> > <br>
> > Thanks,<br>
> > <br>
> > Matt<br>
> > <br>
> > -pc_type lu<br>
> > -pc_factor_mat_solver_package pastix<br>
> > -mat_pastix_verbose 2<br>
> > -mat_pastix_threadnbr 1<br>
> > <br>
> > Giang<br>
> > <br>
> > <br>
> > <br>
> > -- <br>
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
> > -- Norbert Wiener<br>
> > <br>
> > <a href="https://www.cse.buffalo.edu/~knepley/" rel="noreferrer" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
> > <br>
> > <br>
> > -- <br>
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
> > -- Norbert Wiener<br>
> > <br>
> > <a href="https://www.cse.buffalo.edu/~knepley/" rel="noreferrer" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
> <br>
> <br>
> <br>
> -- <br>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
> -- Norbert Wiener<br>
> <br>
> <a href="https://www.cse.buffalo.edu/~knepley/" rel="noreferrer" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
<br>
</blockquote></div></div>
</blockquote></div>
</blockquote></div></div>