[petsc-dev] Valgrind MPI-Related Errors
Karl Rupp
rupp at iue.tuwien.ac.at
Tue Jun 2 00:34:02 CDT 2020
Hi Jacob,
the recommendation in the past was to use MPICH as it is (was?)
valgrind-clean. Which MPI do you use? OpenMPI used to have these kinds
of issues. (My information might be outdated)
Best regards,
Karli
On 6/2/20 2:43 AM, Jacob Faibussowitsch wrote:
> Hello All,
>
> TL;DR: valgrind always complains about "Syscall param write(buf) points
> to uninitialised byte(s)” for a LOT of MPI operations in petsc code,
> making debugging using valgrind fairly annoying since I have to sort
> through a ton of unrelated stuff. I have built valgrind from source,
> used apt install valgrind, apt install valgrind-mpi to no avail.
>
> I am using valgrind from docker. Dockerfile is attached below as well. I
> have been unsuccessfully trying to resolve these local valgrind errors,
> but I am running out of ideas. Googling the issue has also not provided
> entirely applicable solutions. Here is an example of the error:
>
> $ make -f gmakefile test VALGRIND=1
> ...
> #==54610== Syscall param write(buf) points to uninitialised byte(s)
> #==54610== at 0x6F63317: write (write.c:26)
> #==54610== by 0x9056AC9: MPIDI_CH3I_Sock_write (in
> /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x9059FCD: MPIDI_CH3_iStartMsg (in
> /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x903F298: MPIDI_CH3_EagerContigShortSend (in
> /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x9049479: MPID_Send (in /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x8FC9B2A: MPIC_Send (in /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x8F86F2E: MPIR_Bcast_intra_binomial (in
> /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x8EE204E: MPIR_Bcast_intra_auto (in
> /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x8EE21F4: MPIR_Bcast_impl (in
> /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x8F887FB: MPIR_Bcast_intra_smp (in
> /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x8EE206E: MPIR_Bcast_intra_auto (in
> /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x8EE21F4: MPIR_Bcast_impl (in
> /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x8EE2A6F: PMPI_Bcast (in /usr/local/lib/libmpi.so.12.1.8)
> #==54610== by 0x4B377B8: PetscOptionsInsertFile (options.c:525)
> #==54610== by 0x4B39291: PetscOptionsInsert (options.c:672)
> #==54610== by 0x4B5B1EF: PetscInitialize (pinit.c:996)
> #==54610== by 0x10A6BA: main (ex9.c:75)
> #==54610== Address 0x1ffeffa944 is on thread 1's stack
> #==54610== in frame #3, created by MPIDI_CH3_EagerContigShortSend (???:)
> #==54610== Uninitialised value was created by a stack allocation
> #==54610== at 0x903F200: MPIDI_CH3_EagerContigShortSend (in
> /usr/local/lib/libmpi.so.12.1.8)
>
> There are probably 20 such errors every single time, regardless of what
> code is being run. I have tried using apt install valgrind, apt install
> valgrind-mpi, and building valgrind from source:
>
> # VALGRIND
> WORKDIR /
> RUN git clone git://sourceware.org/git/valgrind.git
> WORKDIR /valgrind
> RUN git pull
> RUN ./autogen.sh
> RUN ./configure --with-mpicc=/usr/local/bin/mpicc
> RUN make -j 5
> RUN make install
>
> None of the those approaches lead to these errors disappearing. Perhaps
> I am missing some funky MPI args?
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
>
>
More information about the petsc-dev
mailing list