[petsc-users] Problem with PETSc + HDF5 VecView

Thu Nov 27 05:45:30 CST 2014

Hello Håkon,

I have to express my sincere thanks for digging in and solving this one. We have a code that uses HDF5 and 3D DMDAs, that has been running on smaller clusters and will be running next year on VIlje. (I assume you are using Vilje?) So we would have hit this problem, and you've saved us quite a lot of debugging. I'm in London for the next few months, but I'll make sure to buy you a coffee some time when I'm back in Moustache City.

Best regards,
Åsmund

>Date: Wed, 26 Nov 2014 22:35:55 +0100
>From: H?kon Strandenes <haakon at hakostra.net>
>To: Barry Smith <bsmith at mcs.anl.gov>
>Cc: petsc-users at mcs.anl.gov
>Subject: Re: [petsc-users] Problem with PETSc + HDF5 VecView
>Message-ID: <547647BB.1080103 at hakostra.net>
>Content-Type: text/plain; charset=utf-8; format=flowed
>
>
>
>On 26. nov. 2014 18:23, Barry Smith wrote:
>
>> On Nov 26, 2014, at 6:26 AM, H?kon Strandenes <haakon at hakostra.net> wrote:
>>
>> My local HPC group have found a solution to this problem:
>> On MPT it is possible to set an environment variable MPI_TYPE_DEPTH with default value 8. The MPI_TYPE_DEPTH variable limits the maximum depth of derived datatypes that an application can create.
>
>     Is the variable MPI_TYPE_DEPTH  actually set in the environment to 8 (by default) or does it use the value of 8 if the variable is not found?
The variable does not exist by default. The variable is described here:
http://techpubs.sgi.com/library/dynaweb_docs/0620/SGI_Developer/books/MPT_MPI_PM/sgi_html/ch06.html
If not set, default to 8.

>
>     We can use getenv("MPI_TYPE_DEPTH"); in PETSc when the HDF5 viewer is created to make sure the value is sane and otherwise produce a useful error message telling the user exactly what to do. BUT we need to somehow limit this test to machines where it matters. So for example
>
>     PETSC_EXTERN PetscErrorCode PetscViewerCreate_HDF5(PetscViewer v)
> {
>    PetscViewer_HDF5 *hdf5;
>    PetscErrorCode   ierr;
>    const char *typedepth;
>    int itypedepth;
>
>    PetscFunctionBegin;
> #if defined(PETSC_HAVE_HDF5_REQUIRE_LARGE_MPI_TYPE_DEPTH)
>    typedepth = getenv("MPI_TYPE_DEPTH")
>    sscanf(typedepth,"%d",&itypedepth);
>    if (itypedepth < 100) SETERRQ(...,"This system requires you do \"export MPI_TYPE_DEPTH=100\" before submitting jobs when using HDF5");
> #endif
>
>    but we need a configure test that determines if this is such a system. Can you tell us a "system command" we could run in our configure to detect these SGI MPT system?
>
Is it really PETSc's taks to warn about this? PETSc should trust HDF5 to
"just work" and HDF5 should actually print sensible warnings/error
messages. Shouldn't it?

I'll think about that system command until tomorrow...

>    Thanks
>
>     Barry
>
> A big FAT error message is always better than a FAQ when possible.
Of course.

H?kon

>
>
>>
>> I have found that setting this to at least 32 will make my examples run perfectly on up to 256 processes. No error messages what so ever, and in my simple load and write dataset roundtrip h5diff compares the two datasets and finds then identical. I also notice that Leibniz Rechenzentrum recommend to set this variable to 100 (or some other suitably large value) when using NetCDF together with MPT (https://www.lrz.de/services/software/io/netcdf/).
>>
>> This bug have been a pain in the (***)... Perhaps it is worthy a FAQ entry?
>>
>> Thanks for your time and effort.
>>
>> Regards,
>> H?kon Strandenes
>>
>>
>> On 26. nov. 2014 08:01, H?kon Strandenes wrote:
>>>
>>>
>>> On 25. nov. 2014 22:40, Matthew Knepley wrote:
>>>> On Tue, Nov 25, 2014 at 2:34 PM, H?kon Strandenes <haakon at hakostra.net
>>>> <mailto:haakon at hakostra.net>> wrote:
>>>>
>>>> (...)
>>>>
>>>> First, this is great debugging.
>>>
>>> Thanks.
>>>
>>>>
>>>> Second, my reading of the HDF5 document you linked to says that either
>>>> selection should be valid:
>>>>
>>>>    "For non-regular hyperslab selection, parallel HDF5 uses independent
>>>> IO internally for this option."
>>>>
>>>> so it ought to fall back to the INDEPENDENT model if it can't do
>>>> collective calls correctly. However,
>>>> it appears that the collective call has bugs.
>>>>
>>>> My conclusion: Since you have determined that changing the setting to
>>>> INDEPENDENT produces
>>>> correct input/output in all the test cases, and since my understanding
>>>> of the HDF5 documentation is
>>>> that we should always be able to use COLLECTIVE as an option, this is an
>>>> HDF5 or MPT bug.
>>>
>>> I have conducted yet another test:
>>> My example (ex10) that I previously posted to the mailing list was set
>>> up with 250 grid points along each axis. When the topic on chunking was
>>> brought to the table, I realized that 250 is not evenly dividable on
>>> four. The example failed on 64 processes, that is four processes along
>>> each direction (the division is 62 + 62 + 63 + 63 = 250).
>>>
>>> So I have recompiled "my ex10" with 256 gridpoints in each direction. It
>>> turns out that this does in deed run successfully on 64 nodes. Great! It
>>> also runs on 128 processes, that is a 8x4x4 decomposition. However it
>>> does not run on 125 processes, that is a 5x5x5 decomposition.
>>>
>>> The same pattern is clear if I run my example with 250^3 grid points. It
>>> does not run on numbers like 64 and 128, but does run successfully on
>>> 125 processes, again only when the sub-domains are of exactly equal size
>>> (in this case the domain is divided as 5x5x5).
>>>
>>> However, I still believe that there is bugs. I did my "roundtrip" by
>>> loading a dataset and immediately writing the same dataset to a
>>> different file, this time a 250^3 dataset on 125 processes. It did not
>>> "pass" this test, i.e. the written dataset was just garbage. I have not
>>> yet identified if the garbling is introduced in the reading or writing
>>> of the dataset.
>>>
>>>>
>>>> Does anyone else see the HDF5 differently? Also, it really looks to me
>>>> like HDF5 messed up the MPI
>>>> data type in the COLLECTIVE picture below, since it appears to be sliced
>>>> incorrectly.
>>>>
>>>> Possible Remedies:
>>>>
>>>>    1) We can allow you to turn off H5Pset_dxpl_mpio()
>>>>
>>>>    2) Send this test case to the MPI/IO people at ANL
>>>>
>>>> If you think 1) is what you want, we can do it. If you can package this
>>>> work for 2), it would be really valuable.
>>>
>>> I will be fine editing gr2.c manually each time this file is changed (I
>>> use the sources from Git). But *if* this not a bug in MPT, but a bug in
>>> PETSc or HDF5 it should be fixed... Because it is that kind of bug that
>>> is extremely annoying and a read pain to track down.
>>>
>>> Perhaps the HDF5 mailing list could contribute in this issue?
>>>
>>>>
>>>>    Thanks,
>>>>
>>>>      Matt
>>>>
>>>>     Tanks for your time.
>>>>
>>>>     Best regards,
>>>>     H?kon Strandenes
>>>>
>>>>
>>>
>>> Again thanks for your time.
>>>
>>> Regards,
>>> H?kon
>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which
>>>> their experiments lead.
>>>> -- Norbert Wiener
>
>