pnetcdf-1.7.0 and MPT
Michael Raymond
mraymond at sgi.com
Thu Sep 22 12:12:19 CDT 2016
SGI MPI 1.13, which contains SGI MPT 2.15.
Michael A. Raymond
SGI MPT Team Leader
1 (651) 683-7523
> On Sep 22, 2016, at 12:05, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>
> Thanks, Michael.
>
> Could you please let me know the version number of MPT to be released in November?
> I would like to add a note in PnetCDF's README file.
>
> Wei-keng
>
> On Sep 22, 2016, at 6:39 AM, Michael Raymond wrote:
>
>> It was a bug in SGI MPT. We’re pushing the fix out.
>>
>> As to LIBS, I think this might be needed because MPT doesn’t have a mpif77, just a mpif90. In November’s release of MPT, we’ll have a mpif77 symlink to mpif90.
>>
>> Michael A. Raymond
>> SGI MPT Team Leader
>> 1 (651) 683-7523
>>
>>
>>
>>> On Sep 21, 2016, at 18:16, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>>
>>> Hi, Jim,
>>>
>>> If Eric's approach does not solve your case, please try the following.
>>>
>>> From the error messages, I suspect the cause might be due to an MPI internal
>>> error that fails to create zero-length MPI derived datatypes. I found this
>>> problem in OpenMPI and they fixed it in the latest release.
>>> https://github.com/open-mpi/ompi/issues/1611
>>>
>>> FYI, mpich does not have such problem, but I don't know about SGI MPT.
>>>
>>> If the error message you got came from a PnetCDF test program, then there
>>> is a patch to avoid the error.
>>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001859.html
>>>
>>> Please note this patch does not solve the problem, the fundamental
>>> problem still lies in MPI.
>>>
>>>
>>> Wei-keng
>>>
>>> On Sep 21, 2016, at 3:46 PM, Kemp, Eric M. (GSFC-606.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] wrote:
>>>
>>>>
>>>> Hi Jim:
>>>>
>>>> In my case, README.SGI gave me the clue about setting the “LIBS” variable on the command line when running configure:
>>>>
>>>> ./configure --prefix=$PREFIX \
>>>> --disable-cxx MPICC="$MPICC" MPIF77="$MPIF77" \
>>>> MPIF90="$MPIF90" CC="$CC" F77="$F77" F90="$F90" \
>>>> FC="$FC" TEST_SEQRUN="$TEST_SEQRUN" \
>>>> TEST_MPIRUN="$TEST_MPIRUN" \
>>>> --enable-large-file-test \
>>>> LIBS="-lmpi”
>>>>
>>>> It’s not clear to me why that is necessary (I have MPICC, MPIF77, and MPIF90 set to the SGI MPT wrappers), but it is necessary in my case.
>>>>
>>>> -Eric
>>>>
>>>> Eric M. Kemp (SSAI)
>>>> NASA/GSFC
>>>> Mail Code: 606
>>>> Greenbelt, MD 20771
>>>> 301.286.9768
>>>> eric.kemp at nasa.gov
>>>> eric.kemp at ssaihq.com
>>>>
>>>>
>>>> From: <parallel-netcdf-bounces at lists.mcs.anl.gov> on behalf of Jim Edwards <jedwards at ucar.edu>
>>>> Date: Wednesday, September 21, 2016 at 4:28 PM
>>>> To: Wei-keng Liao <wkliao at eecs.northwestern.edu>
>>>> Cc: "parallel-netcdf at mcs.anl.gov" <parallel-netcdf at mcs.anl.gov>
>>>> Subject: Re: pnetcdf-1.7.0 and MPT
>>>>
>>>> There wasn't anything useful in the README.SGI as far as I could tell. I am exploring getting an update to MPT/2.15 which may solve the problem.
>>>>
>>>> On Wed, Sep 21, 2016 at 11:53 AM, Jim Edwards <jedwards at ucar.edu> wrote:
>>>>> Thanks - I'll give that a try and let you know.
>>>>>
>>>>> On Wed, Sep 21, 2016 at 11:50 AM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>>>>> Hi, Jim, Michael
>>>>>>
>>>>>> Eric Kemp @ NASA/GSFC also encountered a similar error message.
>>>>>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-June/001854.html
>>>>>>
>>>>>> It seems like he was able to solve the problem by trying the build recipe
>>>>>> in README.SGI. Let me know whether this works for you.
>>>>>>
>>>>>> Wei-keng
>>>>>>
>>>>>> On Sep 21, 2016, at 12:36 PM, Michael Raymond wrote:
>>>>>>
>>>>>>> Are you passing a count of 0? That’s probably what MPT is getting caught on. I can have a fix for you to try in a few minutes if so.
>>>>>>>
>>>>>>> Michael A. Raymond
>>>>>>> SGI MPT Team Leader
>>>>>>> 1 (651) 683-7523
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Sep 21, 2016, at 12:26, Jim Edwards <jedwards at ucar.edu> wrote:
>>>>>>>>
>>>>>>>> Trying to use parallel netcdf on an SGI system with mpi/2.14 I am getting the following error:
>>>>>>>>
>>>>>>>> MPT ERROR: rank:10, function:MPI_TYPE_CREATE_HVECTOR, Invalid argument
>>>>>>>>
>>>>>>>> with a traceback:
>>>>>>>>
>>>>>>>> MPT: #7 0x00002aaaaf46dd2a in PMPI_Type_create_hindexed (count=<optimized out>,
>>>>>>>>
>>>>>>>> MPT: blocklens=0x0, indices=0xbf71470, oldtype=27, newtype=0x7fffffff5588)
>>>>>>>>
>>>>>>>> MPT: at type_create_hindexed.c:23
>>>>>>>>
>>>>>>>> MPT: #8 0x0000000000cc01ac in fillerup_aggregate (ncp=0x4ff9,
>>>>>>>>
>>>>>>>> MPT: old_ncp=0x7fffffff4990) at fill.c:727
>>>>>>>>
>>>>>>>> MPT: #9 0x0000000000cb5141 in ncmpii_NC_enddef (ncp=0x4ff9,
>>>>>>>>
>>>>>>>> MPT: h_align=140737488308624, h_minfree=0, v_align=-1, v_minfree=20438,
>>>>>>>>
>>>>>>>> MPT: r_align=0) at nc.c:1187
>>>>>>>>
>>>>>>>> MPT: #10 0x0000000000cb42fb in ncmpii_enddef (ncp=0x4ff9) at nc.c:1318
>>>>>>>>
>>>>>>>>
>>>>>>>> MPT: #11 0x0000000000ca8d7f in ncmpi_enddef (ncid=20473) at mpinetcdf.c:806
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Have you seen this before or have an idea of a fix?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jim Edwards
>>>>>>>>
>>>>>>>> CESM Software Engineer
>>>>>>>> National Center for Atmospheric Research
>>>>>>>> Boulder, CO
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jim Edwards
>>>>>
>>>>> CESM Software Engineer
>>>>> National Center for Atmospheric Research
>>>>> Boulder, CO
>>>>
>>>>
>>>>
>>>> --
>>>> Jim Edwards
>>>>
>>>> CESM Software Engineer
>>>> National Center for Atmospheric Research
>>>> Boulder, CO
>>>
>>
>
More information about the parallel-netcdf
mailing list