[petsc-dev] Including petsc.h breaks user code

William Gropp wgropp at illinois.edu
Mon Sep 15 13:20:55 CDT 2014


On further inspection, the code for MPI_Type_size in MPICH checks for MPI_DATATYPE_NULL.  Is it possible you were using a configuration of MPICH that turned off the error checking?

Bill

On Sep 15, 2014, at 1:11 PM, William Gropp <wgropp at illinois.edu> wrote:

> Actually, MPICH is incorrect here.  NULL objects are an error unless specifically permitted.
> 
> Bill
> 
> On Sep 15, 2014, at 1:08 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>> 
>> Matt,
>> 
>>  I ran with OpenMPI and got exactly the error you’d expect and what they reported "An error occurred in MPI_Type_size”. A simple use of the debugger would reveal where it happened. I suspect that MPICH is more generous when you call MPI_Type_size() with a null type, perhaps it just gives a size of zero.
>> 
>>  I hunted around on the web and could not find a definitive statement of what MPI_Type_size() should do when passed an argument of a null datatype.
>> 
>> 
>>  Barry
>> 
>> On Sep 15, 2014, at 4:40 AM, Matthew Knepley <knepley at gmail.com> wrote:
>> 
>>> On Sun, Sep 14, 2014 at 8:36 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>> 
>>>  Pierre,
>>> 
>>>    Thanks for reporting this, it is, indeed our bug. In petsclog.h we have macros for the various MPI calls in order to log their usage, for example,
>>> 
>>> #define MPI_Scatter(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype,root,comm) \
>>> ((petsc_scatter_ct++,0) || PetscMPITypeSize(&petsc_recv_len,recvcount,recvtype) || MPI_Scatter(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype,root,comm))
>>> 
>>> but PetscMPITypeSize() simply called MPI_Type_size() which generated an MPI error for MPI_DATATYPE_NULL
>>> 
>>> PETSC_STATIC_INLINE PetscErrorCode PetscMPITypeSize(PetscLogDouble *buff,PetscMPIInt count,MPI_Datatype type)
>>> {
>>> PetscMPIInt mysize; return  (MPI_Type_size(type,&mysize) || ((*buff += (PetscLogDouble) (count*mysize)),0));
>>> }
>>> 
>>> What error did you get? Why did I not get this error when I ran it? I ran with MPICH 3.0.4 since that was the one I had compiled for C++.
>>> 
>>> Matt
>>> 
>>> In the branch barry/fix-usage-with-mpidatatypenull I have added a check for this special case and avoid the MPI_Type_size() call. I will put this branch into next and if all tests pass it will be merged into maint and master and be in the next patch release.
>>> 
>>>  Thank you for reporting the problem.
>>> 
>>>  Barry
>>> 
>>> Barry still thinks MPI 1.1 is the height of HPC computing :-(
>>> 
>>> 
>>> 
>>> On Sep 14, 2014, at 4:16 PM, Pierre Jolivet <jolivet at ann.jussieu.fr> wrote:
>>> 
>>>> Hello,
>>>> Could you please explain to me why the following example is not working properly when <petsc.h> (from master, with OpenMPI 1.8.1) is included ?
>>>> 
>>>> $ mpicxx in-place.cpp  -I$PETSC_DIR/include -I$PETSC_DIR/$PETSC_ARCH/include -L$PETSC_DIR/$PETSC_ARCH/lib -lpetsc
>>>> $ mpirun -np 2 ./a.out
>>>> Done with the scatter !
>>>> 0 0 0 0 (this line should be filled with 0)
>>>> 1 1 1 1 (this line should be filled with 1)
>>>> Done with the gather !
>>>> 
>>>> $ mpicxx in-place.cpp  -I$PETSC_DIR/include -I$PETSC_DIR/$PETSC_ARCH/include -L$PETSC_DIR/$PETSC_ARCH/lib -lpetsc -DPETSC_BUG
>>>> $ mpirun -np 2 ./a.out
>>>> [:3367] *** An error occurred in MPI_Type_size
>>>> [:3367] *** reported by process [4819779585,140733193388032]
>>>> [:3367] *** on communicator MPI_COMM_WORLD
>>>> [:3367] *** MPI_ERR_TYPE: invalid datatype
>>>> [:3367] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>>> [:3367] ***    and potentially your MPI job)
>>>> 
>>>> Thank you for looking,
>>>> Pierre
>>>> 
>>>> <in-place.cpp>
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>> 
> 




More information about the petsc-dev mailing list