pnetcdf-1.7.0 and MPT

Wei-keng Liao wkliao at eecs.northwestern.edu
Wed Sep 21 18:16:29 CDT 2016


Hi, Jim,

If Eric's approach does not solve your case, please try the following.

From the error messages, I suspect the cause might be due to an MPI internal
error that fails to create zero-length MPI derived datatypes. I found this
problem in OpenMPI and they fixed it in the latest release.
https://github.com/open-mpi/ompi/issues/1611

FYI, mpich does not have such problem, but I don't know about SGI MPT.

If the error message you got came from a PnetCDF test program, then there
is a patch to avoid the error.
http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001859.html

Please note this patch does not solve the problem, the fundamental
problem still lies in MPI.


Wei-keng

On Sep 21, 2016, at 3:46 PM, Kemp, Eric M. (GSFC-606.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] wrote:

> 
> Hi Jim:
> 
> In my case, README.SGI gave me the clue about setting the “LIBS” variable on the command line when running configure:
> 
>             ./configure --prefix=$PREFIX  \
>                 --disable-cxx MPICC="$MPICC" MPIF77="$MPIF77" \
>                 MPIF90="$MPIF90" CC="$CC" F77="$F77" F90="$F90" \
>                 FC="$FC" TEST_SEQRUN="$TEST_SEQRUN" \
>                 TEST_MPIRUN="$TEST_MPIRUN" \
>                 --enable-large-file-test \
>                 LIBS="-lmpi” 
> 
> It’s not clear to me why that is necessary (I have MPICC, MPIF77, and MPIF90 set to the SGI MPT wrappers), but it is necessary in my case.
> 
> -Eric
> 
> Eric M. Kemp (SSAI) 
> NASA/GSFC 
> Mail Code: 606 
> Greenbelt, MD 20771 
> 301.286.9768 
> eric.kemp at nasa.gov
> eric.kemp at ssaihq.com
> 
> 
> From: <parallel-netcdf-bounces at lists.mcs.anl.gov> on behalf of Jim Edwards <jedwards at ucar.edu>
> Date: Wednesday, September 21, 2016 at 4:28 PM
> To: Wei-keng Liao <wkliao at eecs.northwestern.edu>
> Cc: "parallel-netcdf at mcs.anl.gov" <parallel-netcdf at mcs.anl.gov>
> Subject: Re: pnetcdf-1.7.0 and MPT
> 
> There wasn't anything useful in the README.SGI as far as I could tell.  I am exploring getting an update to MPT/2.15 which may solve the problem.
> 
> On Wed, Sep 21, 2016 at 11:53 AM, Jim Edwards <jedwards at ucar.edu> wrote:
>> Thanks - I'll give that a try and let you know.
>> 
>> On Wed, Sep 21, 2016 at 11:50 AM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>> Hi, Jim, Michael
>>> 
>>> Eric Kemp @ NASA/GSFC also encountered a similar error message.
>>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001854.html
>>> 
>>> It seems like he was able to solve the problem by trying the build recipe
>>> in README.SGI. Let me know whether this works for you.
>>> 
>>> Wei-keng
>>> 
>>> On Sep 21, 2016, at 12:36 PM, Michael Raymond wrote:
>>> 
>>> >   Are you passing a count of 0? That’s probably what MPT is getting caught on. I can have a fix for you to try in a few minutes if so.
>>> >
>>> > Michael A. Raymond
>>> > SGI MPT Team Leader
>>> > 1 (651) 683-7523
>>> >
>>> >
>>> >
>>> >> On Sep 21, 2016, at 12:26, Jim Edwards <jedwards at ucar.edu> wrote:
>>> >>
>>> >> Trying to use parallel netcdf on an SGI system with mpi/2.14 I am getting the following error:
>>> >>
>>> >> MPT ERROR: rank:10, function:MPI_TYPE_CREATE_HVECTOR, Invalid argument
>>> >>
>>> >> with a traceback:
>>> >>
>>> >> MPT: #7  0x00002aaaaf46dd2a in PMPI_Type_create_hindexed (count=<optimized out>,
>>> >>
>>> >> MPT:     blocklens=0x0, indices=0xbf71470, oldtype=27, newtype=0x7fffffff5588)
>>> >>
>>> >> MPT:     at type_create_hindexed.c:23
>>> >>
>>> >> MPT: #8  0x0000000000cc01ac in fillerup_aggregate (ncp=0x4ff9,
>>> >>
>>> >> MPT:     old_ncp=0x7fffffff4990) at fill.c:727
>>> >>
>>> >> MPT: #9  0x0000000000cb5141 in ncmpii_NC_enddef (ncp=0x4ff9,
>>> >>
>>> >> MPT:     h_align=140737488308624, h_minfree=0, v_align=-1, v_minfree=20438,
>>> >>
>>> >> MPT:     r_align=0) at nc.c:1187
>>> >>
>>> >> MPT: #10 0x0000000000cb42fb in ncmpii_enddef (ncp=0x4ff9) at nc.c:1318
>>> >>
>>> >>
>>> >> MPT: #11 0x0000000000ca8d7f in ncmpi_enddef (ncid=20473) at mpinetcdf.c:806
>>> >>
>>> >>
>>> >>
>>> >> Have you seen this before or have an idea of a fix?
>>> >>
>>> >>
>>> >>
>>> >> Thanks,
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jim Edwards
>>> >>
>>> >> CESM Software Engineer
>>> >> National Center for Atmospheric Research
>>> >> Boulder, CO
>>> >
>>> 
>> 
>> 
>> 
>> -- 
>> Jim Edwards
>> 
>> CESM Software Engineer
>> National Center for Atmospheric Research
>> Boulder, CO 
> 
> 
> 
> -- 
> Jim Edwards
> 
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO 



More information about the parallel-netcdf mailing list