[Darshan-users] Module contains incomplete data
Phil Carns
carns at mcs.anl.gov
Tue May 10 15:28:09 CDT 2022
Hunh, that's interesting. On most systems, the install process would
set either RUNPATH or RPATH in the executable to point to the install
directory's lib path so that LD_LIBRARY_PATH does not need to be set.
For example, the darshan-parser binary installed in
/home/carns/working/install/bin on my laptop shows this:
carns-x1-7g ~/w/i/bin> readelf -d darshan-parser |grep RUNPATH
0x000000000000001d (RUNPATH) Library runpath:
[/home/carns/working/install/lib]
carns-x1-7g ~/w/i/bin> readelf -d darshan-parser |grep RPATH
You have to check both because some platforms might use RUNPATH or
RPATH. They have slightly different semantics but accomplish the same
thing for our purposes. At any rate, since the executable has the path
embedded within it, it will work without having to set LD_LIBRARY_PATH.
I'm not sure why you are having a different experience, but it sounds
like things are up and running for you anyhow.
thanks,
-Phil
On 5/9/22 1:59 AM, Jiří Nádvorník wrote:
> Hi Phil and others,
>
> in the end the problem was somewhere else and much more prosaic :/.
> The path where darshan was installed by default (lib folder
> specifically) was missing from the LD_LIBRARY_PATH environment
> variable and therefore the library could not be found at runtime.
>
> BR,
>
> Jiri
>
>
>
> pá 6. 5. 2022 v 15:56 odesílatel Jiří Nádvorník
> <nadvornik.ji at gmail.com> napsal:
>
> Aha,
>
> maybe I formulated it wrongly. The problem is that the version
> that is installed by make install is taken from the .lib folder
> and not from the darshan-util one. So when trying to run for
> example darshan-parser, it complains about
> ".libs/libdarshan-util.so.0" missing.
>
> Cheers,
>
> Jiri
>
> st 4. 5. 2022 v 22:41 odesílatel Phil Carns <carns at mcs.anl.gov>
> napsal:
>
> Hi Jiri,
>
> For #2, are you talking about .libs/darshan-parser in the
> build path? If so, I wouldn't expect to be able to run that
> directly (in general, not particular to Darshan's build system).
>
> Autotools and libtool create executables in .libs subdirs as
> part of the build process, but those are intermediate
> executables that don't have final library paths set. The copy
> installed in <prefix>/bin should be fine (and should use
> corresponding installed libraries), and you should also be
> able to run darshan-parser one level up in the build tree
> (that's actually a shell script wrapper created by libtool
> that will run the .libs/darshan-parser with library paths set
> to the build tree).
>
> Sometimes .libs/ executables might work, but it's a little
> dicey what libraries they will pick up so it's usually not a
> good idea.
>
> thanks,
>
> -Phil
>
> On 4/27/22 1:34 PM, Jiří Nádvorník wrote:
>> Hi,
>>
>> to reproduce the installation issue:
>> mkdir darshan_root
>> cd darshan_root
>> git clone https://github.com/darshan-hpc/darshan.git .
>>
>> Then cd darshan-utils/ and then run:
>> autoconf
>> configure
>> make install
>>
>> Then:
>>
>> 1. If running darshan-parser within the same folder it runs
>> fine.
>> 2. If running .lib/darshan-parser (which is installed by
>> make install) it crashes with the library not available,
>> see the previous email.
>>
>> Cheers,
>>
>> Jiri
>>
>>
>>
>> st 27. 4. 2022 v 18:23 odesílatel Snyder, Shane
>> <ssnyder at mcs.anl.gov> napsal:
>>
>> Great, I'm glad that you were able to get the
>> instrumentation mostly working!
>>
>> I think it's sensible to ignore Python source/compiled
>> code for most cases -- I doubt there's any insight to
>> gain and you'll just end up trying to filter them out
>> when analyzing logs anyways.
>>
>> I'm not sure what's going on with the installation issues
>> you mention. If you think something might be wrong with
>> Darshan's build, then would you mind sharing how you ran
>> configure, etc.? I could see if I'm able to reproduce
>> anything.
>>
>> If you wouldn't mind starting a new thread related to the
>> HDF5 issue, I think that would be helpful -- it might
>> help if other users ever want to search the list archive
>> for h5py/HDF5 related issues if you include those in the
>> title.
>>
>> --Shane
>> ------------------------------------------------------------------------
>> *From:* Jiří Nádvorník <nadvornik.ji at gmail.com>
>> *Sent:* Wednesday, April 27, 2022 11:06 AM
>> *To:* Snyder, Shane <ssnyder at mcs.anl.gov>
>> *Cc:* darshan-users at lists.mcs.anl.gov
>> <darshan-users at lists.mcs.anl.gov>
>> *Subject:* Re: [Darshan-users] Module contains incomplete
>> data
>> Hi,
>>
>> yes, that NAMEMEM got it done. I also excluded .py and
>> .pyc files - those reads are only loading of them, right?
>> No data access itself (and no, I'm not reading and
>> manually interpreting my own python files :), so I'm not
>> interested into that ). Actually, I'm reading thousands
>> of small files which I'm ingesting into HDF5 and I'm
>> interested into how many reads, etc. are happening.
>>
>> I'm trying to make some sense of what I see but for now
>> I'm just going to say it's very valuable data for me.
>> Pity I can't get the Hdf5 module, if it would give me
>> more granularity it would be very helpful.
>>
>> Regarding the darshan-utils you were right, I didn't
>> reinstall them. I actually ran into an install problem -
>> for some reason, the git installation takes the
>> .lib/darshan-parser when installing it to /usr/local/...
>> and that one throws:
>> darshan-parser: error while loading shared libraries:
>> libdarshan-util.so.0: cannot open shared object file: No
>> such file or directory
>>
>> But if I run the darshan-parser within
>> darshan_root_folder/darshan-util/ then the error is gone
>> and --show-incomplete |grep incomplete prints nothing.
>>
>> Could we now focus on the HDF5 issue or should I create a
>> new thread for clarity?
>>
>> Cheers,
>>
>> Jiri
>>
>>
>>
>>
>>
>> st 27. 4. 2022 v 17:21 odesílatel Snyder, Shane
>> <ssnyder at mcs.anl.gov> napsal:
>>
>> Thanks for working through the build issues and
>> giving this a shot.
>>
>> A couple of things stand out to me (ignoring your
>> HDF5 issue for now):
>>
>> * It looks like at least the MPI-IO module is no
>> longer reporting partial data? Small progress...
>> * There is a new warning about there being no log
>> utility handlers for a "null" module. Are you
>> perhaps parsing a log generated by your prior
>> Darshan install? Maybe you have not completely
>> re-installed a new darshan-util? We should figure
>> out what's going on there, too, to be safe.
>>
>> I'd also suggest two things for your config file:
>>
>> * Dial back MODMEM and MAX_RECORDS values. Your
>> MODMEM value is asking Darshan to allocate a GiB
>> of memory (it is expressed in MiB units and you
>> set to 1024), which Darshan will happily try to
>> do, I'm not sure it's a good idea though. I'd
>> probably start with a MODMEM value of 8 and
>> MAX_RECORDS of 2000, and just double those again
>> if needed -- anything beyond that would be
>> surprising unless you know your workload is
>> really opening hundreds of thousands of files.
>> You might also have a look at the files Darshan
>> is currently instrumenting and see if you really
>> want it to -- I've noticed when instrumenting
>> Python frameworks that you can get tons of
>> records for things like shared libraries, source
>> files, etc. that can just be ignored using
>> NAME_EXCLUDE mechanisms.
>> * Add "NAMEMEM 2" to your config file to force
>> Darshan to allocate more memory (2 MiB) for
>> storing the filenames associated with each
>> record. This might actually be the main reason
>> your log is reporting partial data rather than
>> actually running out of module data, which is
>> another reason not to get too aggressive with the
>> MODMEM/MAX_RECORDS parameters. I should have
>> mentioned this setting originally as there have
>> been other users who have reported exceeding it
>> recently.
>>
>> Hopefully that gets you further along and we can move
>> onto the HDF5 issue you mention.
>>
>> Thanks,
>> --Shane
>> ------------------------------------------------------------------------
>> *From:* Jiří Nádvorník <nadvornik.ji at gmail.com>
>> *Sent:* Wednesday, April 27, 2022 6:37 AM
>> *To:* Snyder, Shane <ssnyder at mcs.anl.gov>
>> *Cc:* darshan-users at lists.mcs.anl.gov
>> <darshan-users at lists.mcs.anl.gov>
>> *Subject:* Re: [Darshan-users] Module contains
>> incomplete data
>> Aha! I just realized there is an obvious "prepare.sh"
>> script that I didn't run, I found out by trial and
>> error though, could be more documented :).
>>
>> Now I'm further.. With a config file:
>> MAX_RECORDS 102400 POSIX,MPI-IO,STDIO
>> MODMEM 1024
>> APP_EXCLUDE git,ls
>>
>> I'm getting for:
>> darshan-parser --show-incomplete
>> caucau_python_id127447-127447_4-27-48556-1842455298968263838_1.darshan
>> |grep incomplete
>>
>> output:
>> # *WARNING*: The POSIX module contains incomplete data!
>> # *WARNING*: The STDIO module contains incomplete data!
>> Warning: no log utility handlers defined for module
>> (null), SKIPPING.
>>
>> I don't think I have more than 100000 files to be
>> touched by my poor tiny python script, right?
>>
>> By the way I've encountered another problem, not sure
>> whether to put it to another thread. If I compile
>> with HDF5 (the results above are without it):
>> ./configure --with-log-path=/gpfs/raid/darshan-logs
>> --with-jobid-env=PBS_JOBID CC=mpicc --enable-hdf5-mod
>> --with-hdf5=/gpfs/raid/SDSSCube/ext_lib//hdf5-1.12.0/hdf5/
>>
>> It messes up my runtime and causes python to crash:
>> mpirun -x
>> DARSHAN_CONFIG_PATH=/gpfs/raid/SDSSCube/darshan.conf
>> -x
>> LD_PRELOAD=/gpfs/raid/shared_libs/darshan/darshan-runtime/lib/.libs/libdarshan.so:/gpfs/raid/SDSSCube/ext_lib/hdf5-1.12.0/hdf5/lib/libhdf5.so
>> -np 65 --hostfile hosts --map-by node
>> /gpfs/raid/SDSSCube/venv_par/bin/python hisscube.py
>> --truncate ../sdss_data/ results/SDSS_cube_c_par.h5
>>
>> Resulting in:
>> INFO:rank[0]:Rank 0 pid: 137058
>> Darshan HDF5 module error: runtime library version
>> (1.12) incompatible with Darshan module (1.10-).
>> Traceback (most recent call last):
>> File "hisscube.py", line 74, in <module>
>> writer.ingest(fits_image_path, fits_spectra_path,
>> truncate_file=args.truncate)
>> File
>> "/gpfs/raid/SDSSCube/hisscube/ParallelWriterMWMR.py",
>> line 45, in ingest
>> self.process_metadata(image_path, image_pattern,
>> spectra_path, spectra_pattern, truncate_file)
>> File "/gpfs/raid/SDSSCube/hisscube/CWriter.py",
>> line 150, in process_metadata
>> h5_file = self.open_h5_file_serial(truncate_file)
>> File "/gpfs/raid/SDSSCube/hisscube/CWriter.py",
>> line 170, in open_h5_file_serial
>> return h5py.File(self.h5_path, 'w',
>> fs_strategy="page", fs_page_size=4096, libver="latest")
>> File
>> "/gpfs/raid/SDSSCube/venv_par/lib/python3.8/site-packages/h5py-3.6.0-py3.8-linux-x86_64.egg/h5py/_hl/files.py",
>> line 533, in __init__
>> fid = make_fid(name, mode, userblock_size, fapl,
>> fcpl, swmr=swmr)
>> File
>> "/gpfs/raid/SDSSCube/venv_par/lib/python3.8/site-packages/h5py-3.6.0-py3.8-linux-x86_64.egg/h5py/_hl/files.py",
>> line 232, in make_fid
>> fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl,
>> fcpl=fcpl)
>> File "h5py/_objects.pyx", line 54, in
>> h5py._objects.with_phil.wrapper
>> File "h5py/_objects.pyx", line 55, in
>> h5py._objects.with_phil.wrapper
>> File "h5py/h5f.pyx", line 126, in h5py.h5f.create
>> File "h5py/defs.pyx", line 693, in h5py.defs.H5Fcreate
>> RuntimeError: Unspecified error in H5Fcreate (return
>> value <0)
>>
>> You are saying that darshan should be compatible with
>> HDF5 > 1.8, which 1.12 should be, right?
>>
>> Thanks for help!
>>
>> Cheers,
>>
>> Jiri
>>
>>
>>
>>
>>
>>
>> st 27. 4. 2022 v 8:43 odesílatel Jiří Nádvorník
>> <nadvornik.ji at gmail.com> napsal:
>>
>> Hi,
>>
>> I think I will chew through the documentation
>> just fine but two things are not clear:
>>
>> 1. Does the darshan library provide its own
>> config file that I need to change or do I
>> need to always create my own?
>> 2. How can I build the git version? I didn't
>> find any instructions and the usual autoconf
>> just throws:
>> 1. root at kub-b1:/gpfs/raid/shared_libs/darshan/darshan-runtime#
>> autoconf
>> configure.ac:19 <http://configure.ac:19>:
>> error: possibly undefined macro:
>> AC_CONFIG_MACRO_DIRS
>> If this token and others are
>> legitimate, please use m4_pattern_allow.
>> See the Autoconf documentation.
>> configure.ac:21 <http://configure.ac:21>:
>> error: possibly undefined macro:
>> AM_INIT_AUTOMAKE
>> configure.ac:22 <http://configure.ac:22>:
>> error: possibly undefined macro:
>> AM_SILENT_RULES
>> configure.ac:23 <http://configure.ac:23>:
>> error: possibly undefined macro:
>> AM_MAINTAINER_MODE
>> configure.ac:713
>> <http://configure.ac:713>: error:
>> possibly undefined macro: AM_CONDITIONAL
>> root at kub-b1:/gpfs/raid/shared_libs/darshan/darshan-runtime#
>> ./configure
>> configure: error: cannot find install-sh,
>> install.sh, or shtool in ../maint/scripts
>> "."/../maint/scripts
>>
>> Thanks for help.
>>
>> Cheers,
>>
>> Jiri
>>
>> út 26. 4. 2022 v 17:43 odesílatel Snyder, Shane
>> <ssnyder at mcs.anl.gov> napsal:
>>
>> Hi Jiri,
>>
>> For some background, Darshan enforces some
>> internal memory limits to avoid ballooning
>> memory usage at runtime. Specifically, all of
>> our instrumentation modules should
>> pre-allocate file records for up to 1,024
>> files opened by the app -- if your app opens
>> more than 1,024 files per-process, Darshan
>> stops instrumenting and issues those warning
>> messages when parsing the log file.
>>
>> We have users hit this issue pretty
>> frequently now, and we actually just wrapped
>> up development of some new mechanisms to help
>> out with this. They were just merged into our
>> main branch, and we will be formally
>> releasing a pre-release version of this code
>> in the next week or so. For the time being,
>> you should be able to use the 'main' branch
>> of our repo
>> (https://github.com/darshan-hpc/darshan) to
>> leverage this new functionality.
>>
>> There are 2 new mechanisms that can help out,
>> both of which require you to provide a
>> configuration file to Darshan at runtime:
>>
>> * MAX_RECORDS setting can be used to bump
>> up the number of pre-allocated records
>> for different modules. In your case, you
>> might try to bump up the default number
>> of records for the POSIX, MPI-IO, and
>> STDIO modules by setting something like
>> this in your config file (this would
>> allow you to instrument up to 4000 files
>> per-process for each of these modules):
>> o MAX_RECORDS 4000 POSIX,MPI-IO,STDIO
>> * An alternative (or complementary)
>> approach to bumping up the record limit
>> is to limit instrumentation to particular
>> files. You can use the NAME_EXCLUDE
>> setting to avoid instrumenting specific
>> directory paths, file extensions, etc by
>> specifying regular expressions. E.g, the
>> following settings would avoid
>> instrumenting files with .so prefixes or
>> files located in a directory we don't
>> care about for all modules (* denotes all
>> modules):
>> o NAME_EXCLUDE .so$ *
>> o NAME_EXCLUDE ^/path/to/avoid *
>>
>> I'm attaching the updated runtime
>> documentation for Darshan for your reference.
>> Section 8 provides a ton of details on how to
>> provide a config file to Darshan that should
>> help clear up any missing gaps in my
>> description above.
>>
>> Please let us know if you have any further
>> questions or issues, though!
>>
>> Thanks,
>> --Shane
>> ------------------------------------------------------------------------
>> *From:* Darshan-users
>> <darshan-users-bounces at lists.mcs.anl.gov> on
>> behalf of Jiří Nádvorník <nadvornik.ji at gmail.com>
>> *Sent:* Sunday, April 24, 2022 3:00 PM
>> *To:* darshan-users at lists.mcs.anl.gov
>> <darshan-users at lists.mcs.anl.gov>
>> *Subject:* [Darshan-users] Module contains
>> incomplete data
>> Hi All,
>>
>> I just tried out Darshan and the potential
>> output seems perfect for my HDF5 MPI
>> application! Although I'm not able to get
>> there :(.
>>
>> I have a log that has a big stamp "This
>> darshan log contains incomplete data".
>>
>> When I run:
>> darshan-parser --show-incomplete
>> mylog.darshan |grep incomplete
>> Output is:
>> # *WARNING*: The POSIX module contains
>> incomplete data!
>> # *WARNING*: The MPI-IO module contains
>> incomplete data!
>> # *WARNING*: The STDIO module contains
>> incomplete data!
>>
>> Would you be able to point me to some setting
>> that would improve the measurements? Can I
>> actually rely on the profiling results if it
>> says the data is incomplete in some of the
>> categories?
>>
>> Thank you very much for your help!
>>
>> Cheers,
>>
>> Jiri
>>
>>
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20220510/97578563/attachment-0001.html>
More information about the Darshan-users
mailing list