[petsc-dev] valgrind errors in the SUPERLU*

Xiaoye S. Li xsli at lbl.gov
Sat Jan 16 23:21:27 CST 2016


I wrote the following dummy MPI program, and followed the wiki page to
generate the suppression file.  It helps a bit, but still generates a lot
of unwanted ones.  For a 8-process run, I got the output file with over
45000 lines.

Is this mainly due to MPI program?  I remember using Linux version  with
sequential program is very clean.

Jeff -- if you succeed with building MPICH out Cori, I'd like to use that
to simply the use of valgrind.

Thanks,
Sherry


*#include* <stdio.h>

*#include* <mpi.h>


*// compile: cc test.c $VALGRIND_MPI_LINK
                                 *

*// run:  srun -n 1 valgrind --leak-check=full --gen-suppressions=all
./a.out                               *


*main*(*int* *argc*, *char* **argv*[])

{

    *int* *iam*, *nprocs*;

    MPI_Init( &argc, &argv );


    MPI_Comm_size( MPI_COMM_WORLD, &nprocs );

    MPI_Comm_rank( MPI_COMM_WORLD, &iam );

    printf("iam %d, nprocs %d\n", iam, nprocs); fflush(stdout);


    MPI_Finalize();

}


On Sat, Jan 16, 2016 at 5:42 PM, Jeff Hammond <jeff.science at gmail.com>
wrote:

> Sherry, Satish:
>
> Building MPI from source on Cray machines is not trivial, because Cray MPI
> is proprietary.
>
> However building MPICH on Cray is actually possible now as of a month or
> two ago thanks to libfabric, but there are approximately two people on
> earth who have done it successfully.
>
> If it is critical for debugging to have this at NERSC, I'll share the Cori
> MPI build details once I've got everything in order.
>
> Jeff
>
>
> On Saturday, January 16, 2016, Satish Balay <balay at mcs.anl.gov> wrote:
>
>> On Sat, 16 Jan 2016, Xiaoye S. Li wrote:
>>
>> > By the way, I have a question about valgrind.   Lately I have been
>> chasing
>> > a possible memory corruption on  Cori at NERSC.  Valgrind gave LOTS OF
>> > internal system-related warnings, such as:
>> >
>> > ==39059== Conditional jump or move depends on uninitialised value(s)
>> > ==39059==    at 0xD5E336: __register_atfork (register-atfork.c:119)
>> > ==39059==    by 0xD5E418: __libc_pthread_init (libc_pthread_init.c:48)
>> > ==39059==    by 0x6629E1: __pthread_initialize_minimal (nptl-init.c:462)
>> > ==39059==    by 0xD14BDA: (below main) (libc-start.c:152)
>> > ==39059==
>> > ==39057==    by 0xD5E418: __libc_pthread_init (libc_pthread_init.c:48)
>> > ==39057==    by 0x6629E1: __pthread_initialize_minimal (nptl-init.c:462)
>> > ==39057==    by 0xD14BDA: (below main) (libc-start.c:152)
>> >
>> > ​This has nothing do do with my program.  Do you know a way to NOT to
>> print
>> > those?   I got a huge file filled with these, hard to find out the real
>> > error.
>>
>> You can ask valgrind to create a supression file.
>>
>> [then edit this file - and format it to include only the stuff that you
>> want to supress]
>>
>> And then - use this file for your next run - to catch the actual issues.
>>
>> https://wiki.wxwidgets.org/Valgrind_Suppression_File_Howto has some of
>> this info..
>>
>>
>> We normally recommend using valgrind on linux with mpich built with
>> '--enable-g=meminit' [--download-mpich option with petsc configure
>> defaults to this mode] - so that its valgrind clean - and not bother
>> with supression files.
>>
>> Satish
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20160116/19c7f1a6/attachment.html>


More information about the petsc-dev mailing list