[petsc-dev] Valgrind MPI-Related Errors

Satish Balay balay at mcs.anl.gov
Tue Jun 2 12:33:48 CDT 2020


If using docker - I would think you should be able to make changes [as you need] and save them [as your own docker image file].

And wrt PETSC-CI, only stage-1 tests use the docker images.

A single image for the whole CI doesn't make any sense. [we test different OSes, different compilers, different variants of linux, different configure features - i.e use pre-installed packages, also test configure features to install install packages etc].

And yes - valgrind build in CI uses --download-mpich [ andno docker here - its one of the slowest builds - so needs to be run on the fastest machine we have access to]

Satish

On Tue, 2 Jun 2020, Jacob Faibussowitsch wrote:

> > MPICH need to be built with the option --enable-g=meminit for it to be valgrind clean.
> I see. Two questions:
> 
> 1. I am using Jeds docker image for MPICH (actually this is the main reason for using Jeds image, as configuring and building MPICH takes an absolute age on docker), will this re-build the MPICH already installed in the image or download and build an entirely new MPICH?
> 
> 2. (More a question for Jed) If I am using Jeds docker image anyways and we have CI/CD builds also using his docker images (correct me if I am wrong here) and a valgrind build in stage 3 why not include those build arguments in the MPICH included in the images? I think the valgrind build for CI/CD actually downloads and builds MPICH through petsc configure, is this because of the errors I detailed below?
> 
> Best regards,
> 
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
> 
> > On Jun 2, 2020, at 11:59 AM, Satish Balay <balay at mcs.anl.gov> wrote:
> > 
> > MPICH need to be built with the option --enable-g=meminit for it to be valgrind clean.
> > 
> > --download-mpich does this [among other things that are useful during software developement]. Pre-configured MPICH is not likely to do this.
> > 
> > You can verify with: mpichversion
> > 
> > You can prebuild MPICH using PETSc with:
> > 
> > ./configure --prefix=$HOME/soft/mpich --download-mpich CFLAGS= FFLAGS= CXXFLAGS= COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS=
> > 
> > And make this your default pre-installed MPI. [by adding $HOME/soft/mpich to PATH]
> > 
> > Satish
> > 
> > 
> > On Tue, 2 Jun 2020, Jacob Faibussowitsch wrote:
> > 
> >> Yes I am using the pre-loaded MPICH from the docker image. Further proof from configure
> >> 
> >> #define PETSC_HAVE_MPICH_NUMVERSION 30302300
> >> #define PETSC_HAVE_MPIEXEC_ENVIRONMENTAL_VARIABLE MPIR_CVAR_CH3
> >> 
> >> Best regards,
> >> 
> >> Jacob Faibussowitsch
> >> (Jacob Fai - booss - oh - vitch)
> >> Cell: (312) 694-3391
> >> 
> >>> On Jun 2, 2020, at 11:35 AM, Junchao Zhang <junchao.zhang at gmail.com> wrote:
> >>> 
> >>> I guess Jacob already used MPICH, since MPIDI_CH3_EagerContigShortSend() is from MPICH. 
> >>> 
> >>> --Junchao Zhang
> >>> 
> >>> 
> >>> On Tue, Jun 2, 2020 at 9:38 AM Satish Balay via petsc-dev <petsc-dev at mcs.anl.gov <mailto:petsc-dev at mcs.anl.gov> <mailto:petsc-dev at mcs.anl.gov <mailto:petsc-dev at mcs.anl.gov>>> wrote:
> >>> use --download-mpich for valgrind.
> >>> 
> >>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind><https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>>
> >>> 
> >>> Satish
> >>> 
> >>> On Tue, 2 Jun 2020, Karl Rupp wrote:
> >>> 
> >>>> Hi Jacob,
> >>>> 
> >>>> the recommendation in the past was to use MPICH as it is (was?)
> >>>> valgrind-clean. Which MPI do you use? OpenMPI used to have these kinds of
> >>>> issues. (My information might be outdated)
> >>>> 
> >>>> Best regards,
> >>>> Karli
> >>>> 
> >>>> On 6/2/20 2:43 AM, Jacob Faibussowitsch wrote:
> >>>>> Hello All,
> >>>>> 
> >>>>> TL;DR: valgrind always complains about "Syscall param write(buf) points to
> >>>>> uninitialised byte(s)” for a LOT of MPI operations in petsc code, making
> >>>>> debugging using valgrind fairly annoying since I have to sort through a ton
> >>>>> of unrelated stuff. I have built valgrind from source, used apt install
> >>>>> valgrind, apt install valgrind-mpi to no avail.
> >>>>> 
> >>>>> I am using valgrind from docker. Dockerfile is attached below as well. I
> >>>>> have been unsuccessfully trying to resolve these local valgrind errors, but
> >>>>> I am running out of ideas. Googling the issue has also not provided entirely
> >>>>> applicable solutions. Here is an example of the error:
> >>>>> 
> >>>>> $ make -f gmakefile test VALGRIND=1
> >>>>> ...
> >>>>> #==54610== Syscall param write(buf) points to uninitialised byte(s)
> >>>>> #==54610==    at 0x6F63317: write (write.c:26)
> >>>>> #==54610==    by 0x9056AC9: MPIDI_CH3I_Sock_write (in 
> >>>>> /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x9059FCD: MPIDI_CH3_iStartMsg (in
> >>>>> /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x903F298: MPIDI_CH3_EagerContigShortSend (in
> >>>>> /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x9049479: MPID_Send (in /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x8FC9B2A: MPIC_Send (in /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x8F86F2E: MPIR_Bcast_intra_binomial (in 
> >>>>> /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x8EE204E: MPIR_Bcast_intra_auto (in
> >>>>> /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x8EE21F4: MPIR_Bcast_impl (in
> >>>>> /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x8F887FB: MPIR_Bcast_intra_smp (in
> >>>>> /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x8EE206E: MPIR_Bcast_intra_auto (in
> >>>>> /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x8EE21F4: MPIR_Bcast_impl (in
> >>>>> /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x8EE2A6F: PMPI_Bcast (in /usr/local/lib/libmpi.so.12.1.8)
> >>>>> #==54610==    by 0x4B377B8: PetscOptionsInsertFile (options.c:525)
> >>>>> #==54610==    by 0x4B39291: PetscOptionsInsert (options.c:672)
> >>>>> #==54610==    by 0x4B5B1EF: PetscInitialize (pinit.c:996)
> >>>>> #==54610==    by 0x10A6BA: main (ex9.c:75)
> >>>>> #==54610==  Address 0x1ffeffa944 is on thread 1's stack
> >>>>> #==54610==  in frame #3, created by MPIDI_CH3_EagerContigShortSend (???:)
> >>>>> #==54610==  Uninitialised value was created by a stack allocation
> >>>>> #==54610==    at 0x903F200: MPIDI_CH3_EagerContigShortSend (in 
> >>>>> /usr/local/lib/libmpi.so.12.1.8)
> >>>>> 
> >>>>> There are probably 20 such errors every single time, regardless of what code
> >>>>> is being run. I have tried using apt install valgrind, apt install
> >>>>> valgrind-mpi, and building valgrind from source:
> >>>>> 
> >>>>> # VALGRIND
> >>>>> WORKDIR /
> >>>>> RUN git clone git://sourceware.org/git/valgrind.git <git://sourceware.org/git/valgrind.git> <http://sourceware.org/git/valgrind.git <http://sourceware.org/git/valgrind.git>>
> >>>>> WORKDIR /valgrind
> >>>>> RUN git pull
> >>>>> RUN ./autogen.sh
> >>>>> RUN ./configure --with-mpicc=/usr/local/bin/mpicc
> >>>>> RUN make -j 5
> >>>>> RUN make install
> >>>>> 
> >>>>> None of the those approaches lead to these errors disappearing. Perhaps I am
> >>>>> missing some funky MPI args?
> >>>>> 
> >>>>> Best regards,
> >>>>> 
> >>>>> Jacob Faibussowitsch
> >>>>> (Jacob Fai - booss - oh - vitch)
> >>>>> Cell: (312) 694-3391
> 
> 


More information about the petsc-dev mailing list