[petsc-users] calloc with MPI and PetSc

Karl Rupp rupp at iue.tuwien.ac.at
Tue Feb 3 09:58:57 CST 2015


Hi Sanjay,

is this the full output? The errors/warnings is due to OpenMPI, so they 
may not be harmful. You may try building and running with mpich instead 
to get rid of these. If these are the only errors reported by valgrind, 
can you also try to use malloc instead of calloc?

Best regards,
Karli



On 02/03/2015 03:48 PM, Sanjay Kharche wrote:
>
> Hi Karl
>
> You are right - the code is not valgrind clean even on single processor. The Valgrind output below shows the line number of the TSSolve in my code.
>
> valgrind ./sk2d
> ==7907== Memcheck, a memory error detector
> ==7907== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
> ==7907== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
> ==7907== Command: ./sk2d
> ==7907==
> ==7907== Invalid read of size 4
> ==7907==    at 0x55985C6: opal_os_dirpath_create (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x553A2C7: orte_session_dir (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x554DAD1: orte_ess_base_app_setup (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x545B584: ??? (in /usr/lib/openmpi/lib/openmpi/mca_ess_singleton.so)
> ==7907==    by 0x552C213: orte_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x54E4FBB: ??? (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x54FE30F: PMPI_Init_thread (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x4136FAA: PetscInitialize (pinit.c:781)
> ==7907==    by 0x8049448: main (sk2d.c:109)
> ==7907==  Address 0x580e9f4 is 68 bytes inside a block of size 71 alloc'd
> ==7907==    at 0x4006D69: malloc (vg_replace_malloc.c:236)
> ==7907==    by 0x5598542: opal_os_dirpath_create (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x553A2C7: orte_session_dir (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x554DAD1: orte_ess_base_app_setup (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x545B584: ??? (in /usr/lib/openmpi/lib/openmpi/mca_ess_singleton.so)
> ==7907==    by 0x552C213: orte_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x54E4FBB: ??? (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x54FE30F: PMPI_Init_thread (in /usr/lib/openmpi/lib/libmpi.so.1.0.2)
> ==7907==    by 0x4136FAA: PetscInitialize (pinit.c:781)
> ==7907==    by 0x8049448: main (sk2d.c:109)
> ________________________________________
> From: Karl Rupp [rupp at iue.tuwien.ac.at]
> Sent: 03 February 2015 14:42
> To: Sanjay Kharche; petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] calloc with MPI and PetSc
>
> Hi Sanjay,
>
> this sounds a lot like a memory corruption somewhere in the code. Could
> you please verify first that the code is valgrind-clean? Does the same
> problem show up with one MPI rank?
>
> Best regards,
> Karli
>
>
> On 02/03/2015 03:21 PM, Sanjay Kharche wrote:
>>
>> Dear All
>>
>> I have a code in C that uses Petsc and MPI. My code is an extension of ex15.c in the ts tutorials.
>>
>> I am trying to allocate memory for 3 int arrays, for which I have already declared int pointers. These arrays are not intended for use by the petsc functions. I am allocating memory using calloc. The use of 1 calloc call is fine, however when I try to allocate memory for 2 or more arrays, the TSSolve(ts,u) gives an error. I found this by including and excluding the TSSolve call. I have tried making the array pointers PetscInt but with same result. The first few lines of the error message are also pasted after the relevant code snippet. Can you let me know how I can allocate memory for 3 arrays. These arrays are not relevant to any petsc functions.
>>
>> thanks
>> Sanjay
>>
>> Relevant code in main():
>>
>>     PetscInt    size = 0;                   /* Petsc/MPI                               */
>>     PetscInt    rank = 0;
>>
>>    int *usr_mybase; // mybase, myend, myblocksize are to be used in non-petsc part of code.
>>    int *usr_myend;
>>    int *usr_myblocksize;
>>     int R_base, transit;
>>     MPI_Status status;
>>     MPI_Request request;
>> /*********************************end of declarations in main************************/
>>     PetscInitialize(&argc,&argv,(char*)0,help);
>>     /* Initialize user application context                              */
>>     user.da           = NULL;
>>     user.boundary     = 1;  /* 0: Drichlet BC; 1: Neumann BC            */
>>     user.viewJacobian = PETSC_FALSE;
>>
>>     MPI_Comm_size(PETSC_COMM_WORLD, &size);
>>     MPI_Comm_rank(PETSC_COMM_WORLD, &rank);
>>
>>     printf("my size is %d, and rank is %d\n",size, rank);
>>
>>      usr_mybase      = (int*) calloc (size,sizeof(int)); // 1st call to calloc is ok.
>> //   usr_myend       = (int*) calloc (size,sizeof(int));  // when I uncomment this call to calloc, TSSolve fails. error below.
>> //   usr_myblocksize = (int*) calloc (size,sizeof(int));
>> .
>> .
>> .
>>      TSSolve(ts,u); // has a problem when I use 2 callocs.
>>
>>
>> The output and error message:
>>
>> mpiexec -n 4 ./sk2d -draw_pause .1 -ts_monitor_draw_solution
>> my size is 4, and rank is 2
>> my size is 4, and rank is 0
>> my size is 4, and rank is 3
>> my size is 4, and rank is 1
>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [0]PETSC ERROR: Floating point exception
>> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [1]PETSC ERROR: Floating point exception
>> [1]PETSC ERROR: Vec entry at local location 320 is not-a-number or infinite at beginning of function: Parameter number 2
>> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [2]PETSC ERROR: Floating point exception
>> [2]PETSC ERROR: Vec entry at local location 10 is not-a-number or infinite at beginning of function: Parameter number 2
>> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [2]PETSC ERROR: Petsc Release Version 3.5.2, unknown
>> [3]PETSC ERROR: [0]PETSC ERROR: Vec entry at local location 293 is not-a-number or infinite at beginning of function: Parameter number 2
>>
>



More information about the petsc-users mailing list