[petsc-users] random SLEPc segfault using openmpi-3.0.1

Matthew Knepley knepley at gmail.com
Fri Oct 19 14:35:05 CDT 2018


On Fri, Oct 19, 2018 at 3:09 PM Moritz Cygorek <mcygorek at uottawa.ca> wrote:

> Hi,
>
> I'm using SLEPc to diagonalize a huge sparse matrix and I've encountered
> random segmentation faults.
>
> I'm actually using a the slepc example 4 without modifications to rule out
> errors due to coding.
>
> Concretely, I use the command line
>
>
> ompirun -n 28 ex4 \
> -file amatrix.bin -eps_tol 1e-6 -eps_target 0 -eps_nev 18 \
> -eps_harmonic -eps_ncv 40 -eps_max_it 100000 \
> -eps_monitor -eps_view  -eps_view_values -eps_view_vectors 2>&1 |tee -a
> $LOGFILE
>
>
>
> The program runs for some time (about half a day) and then stops with the
> error message
>
>
>
> [13]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
>
> There is definitely enough memory, because I'm using less than 4% of the
> available 128GB.
>
>
>
>
> Since everything worked fine on a slower computer with a different setup
> and from previous mailing list comments, I have the feeling that this might
> be due to some issues with MPI.
>
>
> Unfortunately, I have to share the computer with other people and can not
> uninstall the current MPI implementation and I've also heard that there are
> issues if you install more than one MPI implementation.
>
>
> For your information: I've configured PETSc with
>
>
> ./configure
> --with-mpi-dir=/home/applications/builds/intel_2018/openmpi-3.0.1/
> --with-scalar-type=complex --download-mumps --download-scalapack
> --with-blas-lapack-dir=/opt/intel/compilers_and_libraries_2018.2.199/linux/mkl
>
> I wanted to ask a few things:
>
> - Is there a known issue with openmpi causing random segmentation faults?
>
OpenMPI certainly has had bugs, but this is not a constrained enough
question to pin the fault on any one of those.

> - I've also tried to install everything needed by configuring PETSc with
> ./configure \
> --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-scalar-type=complex
> \
> --download-mumps --download-scalapack --download-mpich
> --download-fblaslapack
>
> Here, the problem is that performing the checks after "make" stops after
> the check with 1 MPI process, i.e., the check using 2 MPI just never
> finishes.
> Is that a known issue of conflict between the downloaded mpich and the
> installed openmpi?
>

No, it likely has to do with the network configuration, that is mpiexec is
waiting for gethostbyname() for your machine, which is failing.


> Do you know a way to install mpich without conflicts with openmpi without
> actually removing openmpi?
>

The above can work as long as OpenMPI is not in default compiler paths like
/usr/lib.


> - Some time ago a posted a question in the mailing list about how to
> compile SLEPc/PETSc with OpenMP only instead of MPI. After some time, I was
> able to get MPI to work on a different computer,
> but I was never really able to use OpenMP with slepc, but it would be very
> useful in the present situation. The
>

Why do you think so?


> programs compile but they never take more than 100% CPU load as displayed
> by top.
>

That is perfectly understandable since the memory bandwidth can be maxed
out with fewer cores than are present. OpenMP will not help this.


> The answers to my question contained the recommendations that I should
> configure with --download-openblas and have the OMP_NUM_THREADS variable
> set when executing the program. I did it, but it didn't help either.
>

Yep.


> So, my question: has someone ever managed to find a configure line that
> disables MPI but enables the usage of OpenMP so that the slepc ex4 program
> uses significantly more than 100% CPU usage when executing the standard
> Krylov-Schur method?
>

As I said, this is likely to be impossible for architecture reasons:
https://www.mcs.anl.gov/petsc/documentation/faq.html#computers

  Thanks,

     Matt

> Regards,
>
> Moritz
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181019/0085ce6d/attachment-0001.html>


More information about the petsc-users mailing list