pnetcdf-1.7.0 and MPT

Michael Raymond mraymond at sgi.com
Thu Sep 22 12:12:19 CDT 2016


  SGI MPI 1.13, which contains SGI MPT 2.15.

Michael A. Raymond
SGI MPT Team Leader
1 (651) 683-7523



> On Sep 22, 2016, at 12:05, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
> 
> Thanks, Michael.
> 
> Could you please let me know the version number of MPT to be released in November?
> I would like to add a note in PnetCDF's README file.
> 
> Wei-keng
> 
> On Sep 22, 2016, at 6:39 AM, Michael Raymond wrote:
> 
>> It was a bug in SGI MPT. We’re pushing the fix out.
>> 
>> As to LIBS, I think this might be needed because MPT doesn’t have a mpif77, just a mpif90. In November’s release of MPT, we’ll have a mpif77 symlink to mpif90.
>> 
>> Michael A. Raymond
>> SGI MPT Team Leader
>> 1 (651) 683-7523
>> 
>> 
>> 
>>> On Sep 21, 2016, at 18:16, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>> 
>>> Hi, Jim,
>>> 
>>> If Eric's approach does not solve your case, please try the following.
>>> 
>>> From the error messages, I suspect the cause might be due to an MPI internal
>>> error that fails to create zero-length MPI derived datatypes. I found this
>>> problem in OpenMPI and they fixed it in the latest release.
>>> https://github.com/open-mpi/ompi/issues/1611
>>> 
>>> FYI, mpich does not have such problem, but I don't know about SGI MPT.
>>> 
>>> If the error message you got came from a PnetCDF test program, then there
>>> is a patch to avoid the error.
>>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001859.html
>>> 
>>> Please note this patch does not solve the problem, the fundamental
>>> problem still lies in MPI.
>>> 
>>> 
>>> Wei-keng
>>> 
>>> On Sep 21, 2016, at 3:46 PM, Kemp, Eric M. (GSFC-606.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] wrote:
>>> 
>>>> 
>>>> Hi Jim:
>>>> 
>>>> In my case, README.SGI gave me the clue about setting the “LIBS” variable on the command line when running configure:
>>>> 
>>>>          ./configure --prefix=$PREFIX  \
>>>>              --disable-cxx MPICC="$MPICC" MPIF77="$MPIF77" \
>>>>              MPIF90="$MPIF90" CC="$CC" F77="$F77" F90="$F90" \
>>>>              FC="$FC" TEST_SEQRUN="$TEST_SEQRUN" \
>>>>              TEST_MPIRUN="$TEST_MPIRUN" \
>>>>              --enable-large-file-test \
>>>>              LIBS="-lmpi” 
>>>> 
>>>> It’s not clear to me why that is necessary (I have MPICC, MPIF77, and MPIF90 set to the SGI MPT wrappers), but it is necessary in my case.
>>>> 
>>>> -Eric
>>>> 
>>>> Eric M. Kemp (SSAI) 
>>>> NASA/GSFC 
>>>> Mail Code: 606 
>>>> Greenbelt, MD 20771 
>>>> 301.286.9768 
>>>> eric.kemp at nasa.gov
>>>> eric.kemp at ssaihq.com
>>>> 
>>>> 
>>>> From: <parallel-netcdf-bounces at lists.mcs.anl.gov> on behalf of Jim Edwards <jedwards at ucar.edu>
>>>> Date: Wednesday, September 21, 2016 at 4:28 PM
>>>> To: Wei-keng Liao <wkliao at eecs.northwestern.edu>
>>>> Cc: "parallel-netcdf at mcs.anl.gov" <parallel-netcdf at mcs.anl.gov>
>>>> Subject: Re: pnetcdf-1.7.0 and MPT
>>>> 
>>>> There wasn't anything useful in the README.SGI as far as I could tell.  I am exploring getting an update to MPT/2.15 which may solve the problem.
>>>> 
>>>> On Wed, Sep 21, 2016 at 11:53 AM, Jim Edwards <jedwards at ucar.edu> wrote:
>>>>> Thanks - I'll give that a try and let you know.
>>>>> 
>>>>> On Wed, Sep 21, 2016 at 11:50 AM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>>>>> Hi, Jim, Michael
>>>>>> 
>>>>>> Eric Kemp @ NASA/GSFC also encountered a similar error message.
>>>>>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001854.html
>>>>>> 
>>>>>> It seems like he was able to solve the problem by trying the build recipe
>>>>>> in README.SGI. Let me know whether this works for you.
>>>>>> 
>>>>>> Wei-keng
>>>>>> 
>>>>>> On Sep 21, 2016, at 12:36 PM, Michael Raymond wrote:
>>>>>> 
>>>>>>> Are you passing a count of 0? That’s probably what MPT is getting caught on. I can have a fix for you to try in a few minutes if so.
>>>>>>> 
>>>>>>> Michael A. Raymond
>>>>>>> SGI MPT Team Leader
>>>>>>> 1 (651) 683-7523
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Sep 21, 2016, at 12:26, Jim Edwards <jedwards at ucar.edu> wrote:
>>>>>>>> 
>>>>>>>> Trying to use parallel netcdf on an SGI system with mpi/2.14 I am getting the following error:
>>>>>>>> 
>>>>>>>> MPT ERROR: rank:10, function:MPI_TYPE_CREATE_HVECTOR, Invalid argument
>>>>>>>> 
>>>>>>>> with a traceback:
>>>>>>>> 
>>>>>>>> MPT: #7  0x00002aaaaf46dd2a in PMPI_Type_create_hindexed (count=<optimized out>,
>>>>>>>> 
>>>>>>>> MPT:     blocklens=0x0, indices=0xbf71470, oldtype=27, newtype=0x7fffffff5588)
>>>>>>>> 
>>>>>>>> MPT:     at type_create_hindexed.c:23
>>>>>>>> 
>>>>>>>> MPT: #8  0x0000000000cc01ac in fillerup_aggregate (ncp=0x4ff9,
>>>>>>>> 
>>>>>>>> MPT:     old_ncp=0x7fffffff4990) at fill.c:727
>>>>>>>> 
>>>>>>>> MPT: #9  0x0000000000cb5141 in ncmpii_NC_enddef (ncp=0x4ff9,
>>>>>>>> 
>>>>>>>> MPT:     h_align=140737488308624, h_minfree=0, v_align=-1, v_minfree=20438,
>>>>>>>> 
>>>>>>>> MPT:     r_align=0) at nc.c:1187
>>>>>>>> 
>>>>>>>> MPT: #10 0x0000000000cb42fb in ncmpii_enddef (ncp=0x4ff9) at nc.c:1318
>>>>>>>> 
>>>>>>>> 
>>>>>>>> MPT: #11 0x0000000000ca8d7f in ncmpi_enddef (ncid=20473) at mpinetcdf.c:806
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Have you seen this before or have an idea of a fix?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jim Edwards
>>>>>>>> 
>>>>>>>> CESM Software Engineer
>>>>>>>> National Center for Atmospheric Research
>>>>>>>> Boulder, CO
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Jim Edwards
>>>>> 
>>>>> CESM Software Engineer
>>>>> National Center for Atmospheric Research
>>>>> Boulder, CO 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Jim Edwards
>>>> 
>>>> CESM Software Engineer
>>>> National Center for Atmospheric Research
>>>> Boulder, CO 
>>> 
>> 
> 



More information about the parallel-netcdf mailing list