pnetcdf-1.7.0 and MPT

Wei-keng Liao wkliao at eecs.northwestern.edu
Thu Sep 22 12:05:22 CDT 2016


Thanks, Michael.

Could you please let me know the version number of MPT to be released in November?
I would like to add a note in PnetCDF's README file.

Wei-keng

On Sep 22, 2016, at 6:39 AM, Michael Raymond wrote:

>  It was a bug in SGI MPT. We’re pushing the fix out.
> 
>  As to LIBS, I think this might be needed because MPT doesn’t have a mpif77, just a mpif90. In November’s release of MPT, we’ll have a mpif77 symlink to mpif90.
> 
> Michael A. Raymond
> SGI MPT Team Leader
> 1 (651) 683-7523
> 
> 
> 
>> On Sep 21, 2016, at 18:16, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>> 
>> Hi, Jim,
>> 
>> If Eric's approach does not solve your case, please try the following.
>> 
>> From the error messages, I suspect the cause might be due to an MPI internal
>> error that fails to create zero-length MPI derived datatypes. I found this
>> problem in OpenMPI and they fixed it in the latest release.
>> https://github.com/open-mpi/ompi/issues/1611
>> 
>> FYI, mpich does not have such problem, but I don't know about SGI MPT.
>> 
>> If the error message you got came from a PnetCDF test program, then there
>> is a patch to avoid the error.
>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001859.html
>> 
>> Please note this patch does not solve the problem, the fundamental
>> problem still lies in MPI.
>> 
>> 
>> Wei-keng
>> 
>> On Sep 21, 2016, at 3:46 PM, Kemp, Eric M. (GSFC-606.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] wrote:
>> 
>>> 
>>> Hi Jim:
>>> 
>>> In my case, README.SGI gave me the clue about setting the “LIBS” variable on the command line when running configure:
>>> 
>>>           ./configure --prefix=$PREFIX  \
>>>               --disable-cxx MPICC="$MPICC" MPIF77="$MPIF77" \
>>>               MPIF90="$MPIF90" CC="$CC" F77="$F77" F90="$F90" \
>>>               FC="$FC" TEST_SEQRUN="$TEST_SEQRUN" \
>>>               TEST_MPIRUN="$TEST_MPIRUN" \
>>>               --enable-large-file-test \
>>>               LIBS="-lmpi” 
>>> 
>>> It’s not clear to me why that is necessary (I have MPICC, MPIF77, and MPIF90 set to the SGI MPT wrappers), but it is necessary in my case.
>>> 
>>> -Eric
>>> 
>>> Eric M. Kemp (SSAI) 
>>> NASA/GSFC 
>>> Mail Code: 606 
>>> Greenbelt, MD 20771 
>>> 301.286.9768 
>>> eric.kemp at nasa.gov
>>> eric.kemp at ssaihq.com
>>> 
>>> 
>>> From: <parallel-netcdf-bounces at lists.mcs.anl.gov> on behalf of Jim Edwards <jedwards at ucar.edu>
>>> Date: Wednesday, September 21, 2016 at 4:28 PM
>>> To: Wei-keng Liao <wkliao at eecs.northwestern.edu>
>>> Cc: "parallel-netcdf at mcs.anl.gov" <parallel-netcdf at mcs.anl.gov>
>>> Subject: Re: pnetcdf-1.7.0 and MPT
>>> 
>>> There wasn't anything useful in the README.SGI as far as I could tell.  I am exploring getting an update to MPT/2.15 which may solve the problem.
>>> 
>>> On Wed, Sep 21, 2016 at 11:53 AM, Jim Edwards <jedwards at ucar.edu> wrote:
>>>> Thanks - I'll give that a try and let you know.
>>>> 
>>>> On Wed, Sep 21, 2016 at 11:50 AM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>>>> Hi, Jim, Michael
>>>>> 
>>>>> Eric Kemp @ NASA/GSFC also encountered a similar error message.
>>>>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001854.html
>>>>> 
>>>>> It seems like he was able to solve the problem by trying the build recipe
>>>>> in README.SGI. Let me know whether this works for you.
>>>>> 
>>>>> Wei-keng
>>>>> 
>>>>> On Sep 21, 2016, at 12:36 PM, Michael Raymond wrote:
>>>>> 
>>>>>> Are you passing a count of 0? That’s probably what MPT is getting caught on. I can have a fix for you to try in a few minutes if so.
>>>>>> 
>>>>>> Michael A. Raymond
>>>>>> SGI MPT Team Leader
>>>>>> 1 (651) 683-7523
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Sep 21, 2016, at 12:26, Jim Edwards <jedwards at ucar.edu> wrote:
>>>>>>> 
>>>>>>> Trying to use parallel netcdf on an SGI system with mpi/2.14 I am getting the following error:
>>>>>>> 
>>>>>>> MPT ERROR: rank:10, function:MPI_TYPE_CREATE_HVECTOR, Invalid argument
>>>>>>> 
>>>>>>> with a traceback:
>>>>>>> 
>>>>>>> MPT: #7  0x00002aaaaf46dd2a in PMPI_Type_create_hindexed (count=<optimized out>,
>>>>>>> 
>>>>>>> MPT:     blocklens=0x0, indices=0xbf71470, oldtype=27, newtype=0x7fffffff5588)
>>>>>>> 
>>>>>>> MPT:     at type_create_hindexed.c:23
>>>>>>> 
>>>>>>> MPT: #8  0x0000000000cc01ac in fillerup_aggregate (ncp=0x4ff9,
>>>>>>> 
>>>>>>> MPT:     old_ncp=0x7fffffff4990) at fill.c:727
>>>>>>> 
>>>>>>> MPT: #9  0x0000000000cb5141 in ncmpii_NC_enddef (ncp=0x4ff9,
>>>>>>> 
>>>>>>> MPT:     h_align=140737488308624, h_minfree=0, v_align=-1, v_minfree=20438,
>>>>>>> 
>>>>>>> MPT:     r_align=0) at nc.c:1187
>>>>>>> 
>>>>>>> MPT: #10 0x0000000000cb42fb in ncmpii_enddef (ncp=0x4ff9) at nc.c:1318
>>>>>>> 
>>>>>>> 
>>>>>>> MPT: #11 0x0000000000ca8d7f in ncmpi_enddef (ncid=20473) at mpinetcdf.c:806
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Have you seen this before or have an idea of a fix?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jim Edwards
>>>>>>> 
>>>>>>> CESM Software Engineer
>>>>>>> National Center for Atmospheric Research
>>>>>>> Boulder, CO
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Jim Edwards
>>>> 
>>>> CESM Software Engineer
>>>> National Center for Atmospheric Research
>>>> Boulder, CO 
>>> 
>>> 
>>> 
>>> -- 
>>> Jim Edwards
>>> 
>>> CESM Software Engineer
>>> National Center for Atmospheric Research
>>> Boulder, CO 
>> 
> 



More information about the parallel-netcdf mailing list