pnetcdf-1.7.0 and MPT
Wei-keng Liao
wkliao at eecs.northwestern.edu
Thu Sep 22 12:05:22 CDT 2016
Thanks, Michael.
Could you please let me know the version number of MPT to be released in November?
I would like to add a note in PnetCDF's README file.
Wei-keng
On Sep 22, 2016, at 6:39 AM, Michael Raymond wrote:
> It was a bug in SGI MPT. We’re pushing the fix out.
>
> As to LIBS, I think this might be needed because MPT doesn’t have a mpif77, just a mpif90. In November’s release of MPT, we’ll have a mpif77 symlink to mpif90.
>
> Michael A. Raymond
> SGI MPT Team Leader
> 1 (651) 683-7523
>
>
>
>> On Sep 21, 2016, at 18:16, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>
>> Hi, Jim,
>>
>> If Eric's approach does not solve your case, please try the following.
>>
>> From the error messages, I suspect the cause might be due to an MPI internal
>> error that fails to create zero-length MPI derived datatypes. I found this
>> problem in OpenMPI and they fixed it in the latest release.
>> https://github.com/open-mpi/ompi/issues/1611
>>
>> FYI, mpich does not have such problem, but I don't know about SGI MPT.
>>
>> If the error message you got came from a PnetCDF test program, then there
>> is a patch to avoid the error.
>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001859.html
>>
>> Please note this patch does not solve the problem, the fundamental
>> problem still lies in MPI.
>>
>>
>> Wei-keng
>>
>> On Sep 21, 2016, at 3:46 PM, Kemp, Eric M. (GSFC-606.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] wrote:
>>
>>>
>>> Hi Jim:
>>>
>>> In my case, README.SGI gave me the clue about setting the “LIBS” variable on the command line when running configure:
>>>
>>> ./configure --prefix=$PREFIX \
>>> --disable-cxx MPICC="$MPICC" MPIF77="$MPIF77" \
>>> MPIF90="$MPIF90" CC="$CC" F77="$F77" F90="$F90" \
>>> FC="$FC" TEST_SEQRUN="$TEST_SEQRUN" \
>>> TEST_MPIRUN="$TEST_MPIRUN" \
>>> --enable-large-file-test \
>>> LIBS="-lmpi”
>>>
>>> It’s not clear to me why that is necessary (I have MPICC, MPIF77, and MPIF90 set to the SGI MPT wrappers), but it is necessary in my case.
>>>
>>> -Eric
>>>
>>> Eric M. Kemp (SSAI)
>>> NASA/GSFC
>>> Mail Code: 606
>>> Greenbelt, MD 20771
>>> 301.286.9768
>>> eric.kemp at nasa.gov
>>> eric.kemp at ssaihq.com
>>>
>>>
>>> From: <parallel-netcdf-bounces at lists.mcs.anl.gov> on behalf of Jim Edwards <jedwards at ucar.edu>
>>> Date: Wednesday, September 21, 2016 at 4:28 PM
>>> To: Wei-keng Liao <wkliao at eecs.northwestern.edu>
>>> Cc: "parallel-netcdf at mcs.anl.gov" <parallel-netcdf at mcs.anl.gov>
>>> Subject: Re: pnetcdf-1.7.0 and MPT
>>>
>>> There wasn't anything useful in the README.SGI as far as I could tell. I am exploring getting an update to MPT/2.15 which may solve the problem.
>>>
>>> On Wed, Sep 21, 2016 at 11:53 AM, Jim Edwards <jedwards at ucar.edu> wrote:
>>>> Thanks - I'll give that a try and let you know.
>>>>
>>>> On Wed, Sep 21, 2016 at 11:50 AM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>>>> Hi, Jim, Michael
>>>>>
>>>>> Eric Kemp @ NASA/GSFC also encountered a similar error message.
>>>>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001854.html
>>>>>
>>>>> It seems like he was able to solve the problem by trying the build recipe
>>>>> in README.SGI. Let me know whether this works for you.
>>>>>
>>>>> Wei-keng
>>>>>
>>>>> On Sep 21, 2016, at 12:36 PM, Michael Raymond wrote:
>>>>>
>>>>>> Are you passing a count of 0? That’s probably what MPT is getting caught on. I can have a fix for you to try in a few minutes if so.
>>>>>>
>>>>>> Michael A. Raymond
>>>>>> SGI MPT Team Leader
>>>>>> 1 (651) 683-7523
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Sep 21, 2016, at 12:26, Jim Edwards <jedwards at ucar.edu> wrote:
>>>>>>>
>>>>>>> Trying to use parallel netcdf on an SGI system with mpi/2.14 I am getting the following error:
>>>>>>>
>>>>>>> MPT ERROR: rank:10, function:MPI_TYPE_CREATE_HVECTOR, Invalid argument
>>>>>>>
>>>>>>> with a traceback:
>>>>>>>
>>>>>>> MPT: #7 0x00002aaaaf46dd2a in PMPI_Type_create_hindexed (count=<optimized out>,
>>>>>>>
>>>>>>> MPT: blocklens=0x0, indices=0xbf71470, oldtype=27, newtype=0x7fffffff5588)
>>>>>>>
>>>>>>> MPT: at type_create_hindexed.c:23
>>>>>>>
>>>>>>> MPT: #8 0x0000000000cc01ac in fillerup_aggregate (ncp=0x4ff9,
>>>>>>>
>>>>>>> MPT: old_ncp=0x7fffffff4990) at fill.c:727
>>>>>>>
>>>>>>> MPT: #9 0x0000000000cb5141 in ncmpii_NC_enddef (ncp=0x4ff9,
>>>>>>>
>>>>>>> MPT: h_align=140737488308624, h_minfree=0, v_align=-1, v_minfree=20438,
>>>>>>>
>>>>>>> MPT: r_align=0) at nc.c:1187
>>>>>>>
>>>>>>> MPT: #10 0x0000000000cb42fb in ncmpii_enddef (ncp=0x4ff9) at nc.c:1318
>>>>>>>
>>>>>>>
>>>>>>> MPT: #11 0x0000000000ca8d7f in ncmpi_enddef (ncid=20473) at mpinetcdf.c:806
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Have you seen this before or have an idea of a fix?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jim Edwards
>>>>>>>
>>>>>>> CESM Software Engineer
>>>>>>> National Center for Atmospheric Research
>>>>>>> Boulder, CO
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jim Edwards
>>>>
>>>> CESM Software Engineer
>>>> National Center for Atmospheric Research
>>>> Boulder, CO
>>>
>>>
>>>
>>> --
>>> Jim Edwards
>>>
>>> CESM Software Engineer
>>> National Center for Atmospheric Research
>>> Boulder, CO
>>
>
More information about the parallel-netcdf
mailing list