pnetcdf-1.7.0 and MPT

Michael Raymond mraymond at sgi.com
Thu Sep 22 06:39:31 CDT 2016


  It was a bug in SGI MPT. We’re pushing the fix out.

  As to LIBS, I think this might be needed because MPT doesn’t have a mpif77, just a mpif90. In November’s release of MPT, we’ll have a mpif77 symlink to mpif90.

Michael A. Raymond
SGI MPT Team Leader
1 (651) 683-7523



> On Sep 21, 2016, at 18:16, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
> 
> Hi, Jim,
> 
> If Eric's approach does not solve your case, please try the following.
> 
> From the error messages, I suspect the cause might be due to an MPI internal
> error that fails to create zero-length MPI derived datatypes. I found this
> problem in OpenMPI and they fixed it in the latest release.
> https://github.com/open-mpi/ompi/issues/1611
> 
> FYI, mpich does not have such problem, but I don't know about SGI MPT.
> 
> If the error message you got came from a PnetCDF test program, then there
> is a patch to avoid the error.
> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001859.html
> 
> Please note this patch does not solve the problem, the fundamental
> problem still lies in MPI.
> 
> 
> Wei-keng
> 
> On Sep 21, 2016, at 3:46 PM, Kemp, Eric M. (GSFC-606.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] wrote:
> 
>> 
>> Hi Jim:
>> 
>> In my case, README.SGI gave me the clue about setting the “LIBS” variable on the command line when running configure:
>> 
>>            ./configure --prefix=$PREFIX  \
>>                --disable-cxx MPICC="$MPICC" MPIF77="$MPIF77" \
>>                MPIF90="$MPIF90" CC="$CC" F77="$F77" F90="$F90" \
>>                FC="$FC" TEST_SEQRUN="$TEST_SEQRUN" \
>>                TEST_MPIRUN="$TEST_MPIRUN" \
>>                --enable-large-file-test \
>>                LIBS="-lmpi” 
>> 
>> It’s not clear to me why that is necessary (I have MPICC, MPIF77, and MPIF90 set to the SGI MPT wrappers), but it is necessary in my case.
>> 
>> -Eric
>> 
>> Eric M. Kemp (SSAI) 
>> NASA/GSFC 
>> Mail Code: 606 
>> Greenbelt, MD 20771 
>> 301.286.9768 
>> eric.kemp at nasa.gov
>> eric.kemp at ssaihq.com
>> 
>> 
>> From: <parallel-netcdf-bounces at lists.mcs.anl.gov> on behalf of Jim Edwards <jedwards at ucar.edu>
>> Date: Wednesday, September 21, 2016 at 4:28 PM
>> To: Wei-keng Liao <wkliao at eecs.northwestern.edu>
>> Cc: "parallel-netcdf at mcs.anl.gov" <parallel-netcdf at mcs.anl.gov>
>> Subject: Re: pnetcdf-1.7.0 and MPT
>> 
>> There wasn't anything useful in the README.SGI as far as I could tell.  I am exploring getting an update to MPT/2.15 which may solve the problem.
>> 
>> On Wed, Sep 21, 2016 at 11:53 AM, Jim Edwards <jedwards at ucar.edu> wrote:
>>> Thanks - I'll give that a try and let you know.
>>> 
>>> On Wed, Sep 21, 2016 at 11:50 AM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>>> Hi, Jim, Michael
>>>> 
>>>> Eric Kemp @ NASA/GSFC also encountered a similar error message.
>>>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001854.html
>>>> 
>>>> It seems like he was able to solve the problem by trying the build recipe
>>>> in README.SGI. Let me know whether this works for you.
>>>> 
>>>> Wei-keng
>>>> 
>>>> On Sep 21, 2016, at 12:36 PM, Michael Raymond wrote:
>>>> 
>>>>>  Are you passing a count of 0? That’s probably what MPT is getting caught on. I can have a fix for you to try in a few minutes if so.
>>>>> 
>>>>> Michael A. Raymond
>>>>> SGI MPT Team Leader
>>>>> 1 (651) 683-7523
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Sep 21, 2016, at 12:26, Jim Edwards <jedwards at ucar.edu> wrote:
>>>>>> 
>>>>>> Trying to use parallel netcdf on an SGI system with mpi/2.14 I am getting the following error:
>>>>>> 
>>>>>> MPT ERROR: rank:10, function:MPI_TYPE_CREATE_HVECTOR, Invalid argument
>>>>>> 
>>>>>> with a traceback:
>>>>>> 
>>>>>> MPT: #7  0x00002aaaaf46dd2a in PMPI_Type_create_hindexed (count=<optimized out>,
>>>>>> 
>>>>>> MPT:     blocklens=0x0, indices=0xbf71470, oldtype=27, newtype=0x7fffffff5588)
>>>>>> 
>>>>>> MPT:     at type_create_hindexed.c:23
>>>>>> 
>>>>>> MPT: #8  0x0000000000cc01ac in fillerup_aggregate (ncp=0x4ff9,
>>>>>> 
>>>>>> MPT:     old_ncp=0x7fffffff4990) at fill.c:727
>>>>>> 
>>>>>> MPT: #9  0x0000000000cb5141 in ncmpii_NC_enddef (ncp=0x4ff9,
>>>>>> 
>>>>>> MPT:     h_align=140737488308624, h_minfree=0, v_align=-1, v_minfree=20438,
>>>>>> 
>>>>>> MPT:     r_align=0) at nc.c:1187
>>>>>> 
>>>>>> MPT: #10 0x0000000000cb42fb in ncmpii_enddef (ncp=0x4ff9) at nc.c:1318
>>>>>> 
>>>>>> 
>>>>>> MPT: #11 0x0000000000ca8d7f in ncmpi_enddef (ncid=20473) at mpinetcdf.c:806
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Have you seen this before or have an idea of a fix?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Jim Edwards
>>>>>> 
>>>>>> CESM Software Engineer
>>>>>> National Center for Atmospheric Research
>>>>>> Boulder, CO
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Jim Edwards
>>> 
>>> CESM Software Engineer
>>> National Center for Atmospheric Research
>>> Boulder, CO 
>> 
>> 
>> 
>> -- 
>> Jim Edwards
>> 
>> CESM Software Engineer
>> National Center for Atmospheric Research
>> Boulder, CO 
> 



More information about the parallel-netcdf mailing list