[Darshan-users] Module contains incomplete data

Jiří Nádvorník nadvornik.ji at gmail.com
Mon May 9 00:59:47 CDT 2022


Hi Phil and others,

in the end the problem was somewhere else and much more prosaic :/. The
path where darshan was installed by default (lib folder specifically) was
missing from the LD_LIBRARY_PATH environment variable and therefore the
library could not be found at runtime.

BR,

Jiri



pá 6. 5. 2022 v 15:56 odesílatel Jiří Nádvorník <nadvornik.ji at gmail.com>
napsal:

> Aha,
>
> maybe I formulated it wrongly. The problem is that the version that is
> installed by make install is taken from the .lib folder and not from the
> darshan-util one. So when trying to run for example darshan-parser, it
> complains about ".libs/libdarshan-util.so.0" missing.
>
> Cheers,
>
> Jiri
>
> st 4. 5. 2022 v 22:41 odesílatel Phil Carns <carns at mcs.anl.gov> napsal:
>
>> Hi Jiri,
>>
>> For #2, are you talking about .libs/darshan-parser in the build path?  If
>> so, I wouldn't expect  to be able to run that directly (in general, not
>> particular to Darshan's build system).
>>
>> Autotools and libtool create executables in .libs subdirs as part of the
>> build process, but those are intermediate executables that don't have final
>> library paths set.  The copy installed in <prefix>/bin should be fine (and
>> should use corresponding installed libraries), and you should also be able
>> to run darshan-parser one level up in the build tree (that's actually a
>> shell script wrapper created by libtool that will run the
>> .libs/darshan-parser with library paths set to the build tree).
>>
>> Sometimes .libs/ executables might work, but it's a little dicey what
>> libraries they will pick up so it's usually not a good idea.
>>
>> thanks,
>>
>> -Phil
>> On 4/27/22 1:34 PM, Jiří Nádvorník wrote:
>>
>> Hi,
>>
>> to reproduce the installation issue:
>> mkdir darshan_root
>> cd darshan_root
>> git clone https://github.com/darshan-hpc/darshan.git .
>>
>> Then cd darshan-utils/ and then run:
>> autoconf
>> configure
>> make install
>>
>> Then:
>>
>>    1. If running darshan-parser within the same folder it runs fine.
>>    2. If running .lib/darshan-parser (which is installed by make
>>    install) it crashes with the library not available, see the previous email.
>>
>> Cheers,
>>
>> Jiri
>>
>>
>>
>> st 27. 4. 2022 v 18:23 odesílatel Snyder, Shane <ssnyder at mcs.anl.gov>
>> napsal:
>>
>>> Great, I'm glad that you were able to get the instrumentation mostly
>>> working!
>>>
>>> I think it's sensible to ignore Python source/compiled code for most
>>> cases -- I doubt there's any insight to gain and you'll just end up trying
>>> to filter them out when analyzing logs anyways.
>>>
>>> I'm not sure what's going on with the installation issues you mention.
>>> If you think something might be wrong with Darshan's build, then would you
>>> mind sharing how you ran configure, etc.? I could see if I'm able to
>>> reproduce anything.
>>>
>>> If you wouldn't mind starting a new thread related to the HDF5 issue, I
>>> think that would be helpful -- it might help if other users ever want to
>>> search the list archive for h5py/HDF5 related issues if you include those
>>> in the title.
>>>
>>> --Shane
>>> ------------------------------
>>> *From:* Jiří Nádvorník <nadvornik.ji at gmail.com>
>>> *Sent:* Wednesday, April 27, 2022 11:06 AM
>>> *To:* Snyder, Shane <ssnyder at mcs.anl.gov>
>>> *Cc:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
>>> *Subject:* Re: [Darshan-users] Module contains incomplete data
>>>
>>> Hi,
>>>
>>> yes, that NAMEMEM got it done. I also excluded .py and .pyc files -
>>> those reads are only loading of them, right? No data access itself (and no,
>>> I'm not reading and manually interpreting my own python files :), so I'm
>>> not interested into that ). Actually, I'm reading thousands of small files
>>> which I'm ingesting into HDF5 and I'm interested into how many reads, etc.
>>> are happening.
>>>
>>> I'm trying to make some sense of what I see but for now I'm just going
>>> to say it's very valuable data for me. Pity I can't get the Hdf5 module, if
>>> it would give me more granularity it would be very helpful.
>>>
>>> Regarding the darshan-utils you were right, I didn't reinstall them. I
>>> actually ran into an install problem - for some reason, the git
>>> installation takes the .lib/darshan-parser when installing it to
>>> /usr/local/... and that one throws:
>>> darshan-parser: error while loading shared libraries:
>>> libdarshan-util.so.0: cannot open shared object file: No such file or
>>> directory
>>>
>>> But if I run the darshan-parser within darshan_root_folder/darshan-util/
>>> then the error is gone and --show-incomplete |grep incomplete prints
>>> nothing.
>>>
>>> Could we now focus on the HDF5 issue or should I create a new thread for
>>> clarity?
>>>
>>> Cheers,
>>>
>>> Jiri
>>>
>>>
>>>
>>>
>>>
>>> st 27. 4. 2022 v 17:21 odesílatel Snyder, Shane <ssnyder at mcs.anl.gov>
>>> napsal:
>>>
>>> Thanks for working through the build issues and giving this a shot.
>>>
>>> A couple of things stand out to me (ignoring your HDF5 issue for now):
>>>
>>>    - It looks like at least the MPI-IO module is no longer reporting
>>>    partial data? Small progress...
>>>    - There is a new warning about there being no log utility handlers
>>>    for a "null" module. Are you perhaps parsing a log generated by your prior
>>>    Darshan install? Maybe you have not completely re-installed a new
>>>    darshan-util? We should figure out what's going on there, too, to be safe.
>>>
>>> I'd also suggest two things for your config file:
>>>
>>>    - Dial back MODMEM and MAX_RECORDS values. Your MODMEM value is
>>>    asking Darshan to allocate a GiB of memory (it is expressed in MiB units
>>>    and you set to 1024), which Darshan will happily try to do, I'm not sure
>>>    it's a good idea though. I'd probably start with a MODMEM value of 8 and
>>>    MAX_RECORDS of 2000, and just double those again if needed -- anything
>>>    beyond that would be surprising unless you know your workload is really
>>>    opening hundreds of thousands of files. You might also have a look at the
>>>    files Darshan is currently instrumenting and see if you really want it to
>>>    -- I've noticed when instrumenting Python frameworks that you can get tons
>>>    of records for things like shared libraries, source files, etc. that can
>>>    just be ignored using NAME_EXCLUDE mechanisms.
>>>    - Add "NAMEMEM  2" to your config file to force Darshan to allocate
>>>    more memory (2 MiB) for storing the filenames associated with each record.
>>>    This might actually be the main reason your log is reporting partial data
>>>    rather than actually running out of module data, which is another reason
>>>    not to get too aggressive with the MODMEM/MAX_RECORDS parameters. I should
>>>    have mentioned this setting originally as there have been other users who
>>>    have reported exceeding it recently.
>>>
>>> Hopefully that gets you further along and we can move onto the HDF5
>>> issue you mention.
>>>
>>> Thanks,
>>> --Shane
>>> ------------------------------
>>> *From:* Jiří Nádvorník <nadvornik.ji at gmail.com>
>>> *Sent:* Wednesday, April 27, 2022 6:37 AM
>>> *To:* Snyder, Shane <ssnyder at mcs.anl.gov>
>>> *Cc:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
>>> *Subject:* Re: [Darshan-users] Module contains incomplete data
>>>
>>> Aha! I just realized there is an obvious "prepare.sh" script that I
>>> didn't run, I found out by trial and error though, could be more documented
>>> :).
>>>
>>> Now I'm further.. With a config file:
>>> MAX_RECORDS     102400     POSIX,MPI-IO,STDIO
>>> MODMEM  1024
>>> APP_EXCLUDE     git,ls
>>>
>>> I'm getting for:
>>> darshan-parser --show-incomplete
>>>  caucau_python_id127447-127447_4-27-48556-1842455298968263838_1.darshan
>>> |grep incomplete
>>>
>>> output:
>>> # *WARNING*: The POSIX module contains incomplete data!
>>> # *WARNING*: The STDIO module contains incomplete data!
>>> Warning: no log utility handlers defined for module (null), SKIPPING.
>>>
>>> I don't think I have more than 100000 files to be touched by my poor
>>> tiny python script, right?
>>>
>>> By the way I've encountered another problem, not sure whether to put it
>>> to another thread. If I compile with HDF5 (the results above are without
>>> it):
>>> ./configure --with-log-path=/gpfs/raid/darshan-logs
>>> --with-jobid-env=PBS_JOBID CC=mpicc --enable-hdf5-mod
>>> --with-hdf5=/gpfs/raid/SDSSCube/ext_lib//hdf5-1.12.0/hdf5/
>>>
>>> It messes up my runtime and causes python to crash:
>>> mpirun -x DARSHAN_CONFIG_PATH=/gpfs/raid/SDSSCube/darshan.conf -x
>>> LD_PRELOAD=/gpfs/raid/shared_libs/darshan/darshan-runtime/lib/.libs/libdarshan.so:/gpfs/raid/SDSSCube/ext_lib/hdf5-1.12.0/hdf5/lib/libhdf5.so
>>> -np 65 --hostfile hosts --map-by node
>>> /gpfs/raid/SDSSCube/venv_par/bin/python hisscube.py --truncate
>>> ../sdss_data/ results/SDSS_cube_c_par.h5
>>>
>>> Resulting in:
>>> INFO:rank[0]:Rank 0 pid: 137058
>>> Darshan HDF5 module error: runtime library version (1.12) incompatible
>>> with Darshan module (1.10-).
>>> Traceback (most recent call last):
>>>   File "hisscube.py", line 74, in <module>
>>>     writer.ingest(fits_image_path, fits_spectra_path,
>>> truncate_file=args.truncate)
>>>   File "/gpfs/raid/SDSSCube/hisscube/ParallelWriterMWMR.py", line 45, in
>>> ingest
>>>     self.process_metadata(image_path, image_pattern, spectra_path,
>>> spectra_pattern, truncate_file)
>>>   File "/gpfs/raid/SDSSCube/hisscube/CWriter.py", line 150, in
>>> process_metadata
>>>     h5_file = self.open_h5_file_serial(truncate_file)
>>>   File "/gpfs/raid/SDSSCube/hisscube/CWriter.py", line 170, in
>>> open_h5_file_serial
>>>     return h5py.File(self.h5_path, 'w', fs_strategy="page",
>>> fs_page_size=4096, libver="latest")
>>>   File
>>> "/gpfs/raid/SDSSCube/venv_par/lib/python3.8/site-packages/h5py-3.6.0-py3.8-linux-x86_64.egg/h5py/_hl/files.py",
>>> line 533, in __init__
>>>     fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
>>>   File
>>> "/gpfs/raid/SDSSCube/venv_par/lib/python3.8/site-packages/h5py-3.6.0-py3.8-linux-x86_64.egg/h5py/_hl/files.py",
>>> line 232, in make_fid
>>>     fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
>>>   File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
>>>   File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
>>>   File "h5py/h5f.pyx", line 126, in h5py.h5f.create
>>>   File "h5py/defs.pyx", line 693, in h5py.defs.H5Fcreate
>>> RuntimeError: Unspecified error in H5Fcreate (return value <0)
>>>
>>> You are saying that darshan should be compatible with HDF5 > 1.8, which
>>> 1.12 should be, right?
>>>
>>> Thanks for help!
>>>
>>> Cheers,
>>>
>>> Jiri
>>>
>>>
>>>
>>>
>>>
>>>
>>> st 27. 4. 2022 v 8:43 odesílatel Jiří Nádvorník <nadvornik.ji at gmail.com>
>>> napsal:
>>>
>>> Hi,
>>>
>>> I think I will chew through the documentation just fine but two things
>>> are not clear:
>>>
>>>    1. Does the darshan library provide its own config file that I need
>>>    to change or do I need to always create my own?
>>>    2. How can I build the git version? I didn't find any instructions
>>>    and the usual autoconf just throws:
>>>       1. root at kub-b1:/gpfs/raid/shared_libs/darshan/darshan-runtime#
>>>       autoconf
>>>       configure.ac:19: error: possibly undefined macro:
>>>       AC_CONFIG_MACRO_DIRS
>>>             If this token and others are legitimate, please use
>>>       m4_pattern_allow.
>>>             See the Autoconf documentation.
>>>       configure.ac:21: error: possibly undefined macro: AM_INIT_AUTOMAKE
>>>       configure.ac:22: error: possibly undefined macro: AM_SILENT_RULES
>>>       configure.ac:23: error: possibly undefined macro:
>>>       AM_MAINTAINER_MODE
>>>       configure.ac:713: error: possibly undefined macro: AM_CONDITIONAL
>>>       root at kub-b1:/gpfs/raid/shared_libs/darshan/darshan-runtime#
>>>       ./configure
>>>       configure: error: cannot find install-sh, install.sh, or shtool
>>>       in ../maint/scripts "."/../maint/scripts
>>>
>>> Thanks for help.
>>>
>>> Cheers,
>>>
>>> Jiri
>>>
>>> út 26. 4. 2022 v 17:43 odesílatel Snyder, Shane <ssnyder at mcs.anl.gov>
>>> napsal:
>>>
>>> Hi Jiri,
>>>
>>> For some background, Darshan enforces some internal memory limits to
>>> avoid ballooning memory usage at runtime. Specifically, all of our
>>> instrumentation modules should pre-allocate file records for up to 1,024
>>> files opened by the app -- if your app opens more than 1,024 files
>>> per-process, Darshan stops instrumenting and issues those warning messages
>>> when parsing the log file.
>>>
>>> We have users hit this issue pretty frequently now, and we actually just
>>> wrapped up development of some new mechanisms to help out with this. They
>>> were just merged into our main branch, and we will be formally releasing a
>>> pre-release version of this code in the next week or so. For the time
>>> being, you should be able to use the 'main' branch of our repo (
>>> https://github.com/darshan-hpc/darshan) to leverage this new
>>> functionality.
>>>
>>> There are 2 new mechanisms that can help out, both of which require you
>>> to provide a configuration file to Darshan at runtime:
>>>
>>>    - MAX_RECORDS setting can be used to bump up the number of
>>>    pre-allocated records for different modules. In your case, you might try to
>>>    bump up the default number of records for the POSIX, MPI-IO, and STDIO
>>>    modules  by setting something like this in your config file (this would
>>>    allow you to instrument up to 4000 files per-process for each of these
>>>    modules):
>>>       - MAX_RECORDS    4000    POSIX,MPI-IO,STDIO
>>>    - An alternative (or complementary) approach to bumping up the
>>>    record limit is to limit instrumentation to particular files. You can use
>>>    the NAME_EXCLUDE setting to avoid instrumenting specific directory paths,
>>>    file extensions, etc by specifying regular expressions. E.g, the following
>>>    settings would avoid instrumenting files with .so prefixes or files located
>>>    in a directory we don't care about for all modules (* denotes all modules):
>>>       - NAME_EXCLUDE    .so$    *
>>>       - NAME_EXCLUDE    ^/path/to/avoid    *
>>>
>>> I'm attaching the updated runtime documentation for Darshan for your
>>> reference. Section 8 provides a ton of details on how to provide a config
>>> file to Darshan that should help clear up any missing gaps in my
>>> description above.
>>>
>>> Please let us know if you have any further questions or issues, though!
>>>
>>> Thanks,
>>> --Shane
>>> ------------------------------
>>> *From:* Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on
>>> behalf of Jiří Nádvorník <nadvornik.ji at gmail.com>
>>> *Sent:* Sunday, April 24, 2022 3:00 PM
>>> *To:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
>>> *Subject:* [Darshan-users] Module contains incomplete data
>>>
>>> Hi All,
>>>
>>> I just tried out Darshan and the potential output seems perfect for my
>>> HDF5 MPI application! Although I'm not able to get there :(.
>>>
>>> I have a log that has a big stamp "This darshan log contains incomplete
>>> data".
>>>
>>> When I run:
>>> darshan-parser --show-incomplete  mylog.darshan |grep incomplete
>>> Output is:
>>> # *WARNING*: The POSIX module contains incomplete data!
>>> # *WARNING*: The MPI-IO module contains incomplete data!
>>> # *WARNING*: The STDIO module contains incomplete data!
>>>
>>> Would you be able to point me to some setting that would improve the
>>> measurements? Can I actually rely on the profiling results if it says the
>>> data is incomplete in some of the categories?
>>>
>>> Thank you very much for your help!
>>>
>>> Cheers,
>>>
>>> Jiri
>>>
>>>
>> _______________________________________________
>> Darshan-users mailing listDarshan-users at lists.mcs.anl.govhttps://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20220509/53dae8f2/attachment-0001.html>


More information about the Darshan-users mailing list