[Darshan-users] Module contains incomplete data
Jiří Nádvorník
nadvornik.ji at gmail.com
Fri May 6 08:56:03 CDT 2022
Aha,
maybe I formulated it wrongly. The problem is that the version that is
installed by make install is taken from the .lib folder and not from the
darshan-util one. So when trying to run for example darshan-parser, it
complains about ".libs/libdarshan-util.so.0" missing.
Cheers,
Jiri
st 4. 5. 2022 v 22:41 odesílatel Phil Carns <carns at mcs.anl.gov> napsal:
> Hi Jiri,
>
> For #2, are you talking about .libs/darshan-parser in the build path? If
> so, I wouldn't expect to be able to run that directly (in general, not
> particular to Darshan's build system).
>
> Autotools and libtool create executables in .libs subdirs as part of the
> build process, but those are intermediate executables that don't have final
> library paths set. The copy installed in <prefix>/bin should be fine (and
> should use corresponding installed libraries), and you should also be able
> to run darshan-parser one level up in the build tree (that's actually a
> shell script wrapper created by libtool that will run the
> .libs/darshan-parser with library paths set to the build tree).
>
> Sometimes .libs/ executables might work, but it's a little dicey what
> libraries they will pick up so it's usually not a good idea.
>
> thanks,
>
> -Phil
> On 4/27/22 1:34 PM, Jiří Nádvorník wrote:
>
> Hi,
>
> to reproduce the installation issue:
> mkdir darshan_root
> cd darshan_root
> git clone https://github.com/darshan-hpc/darshan.git .
>
> Then cd darshan-utils/ and then run:
> autoconf
> configure
> make install
>
> Then:
>
> 1. If running darshan-parser within the same folder it runs fine.
> 2. If running .lib/darshan-parser (which is installed by make install)
> it crashes with the library not available, see the previous email.
>
> Cheers,
>
> Jiri
>
>
>
> st 27. 4. 2022 v 18:23 odesílatel Snyder, Shane <ssnyder at mcs.anl.gov>
> napsal:
>
>> Great, I'm glad that you were able to get the instrumentation mostly
>> working!
>>
>> I think it's sensible to ignore Python source/compiled code for most
>> cases -- I doubt there's any insight to gain and you'll just end up trying
>> to filter them out when analyzing logs anyways.
>>
>> I'm not sure what's going on with the installation issues you mention. If
>> you think something might be wrong with Darshan's build, then would you
>> mind sharing how you ran configure, etc.? I could see if I'm able to
>> reproduce anything.
>>
>> If you wouldn't mind starting a new thread related to the HDF5 issue, I
>> think that would be helpful -- it might help if other users ever want to
>> search the list archive for h5py/HDF5 related issues if you include those
>> in the title.
>>
>> --Shane
>> ------------------------------
>> *From:* Jiří Nádvorník <nadvornik.ji at gmail.com>
>> *Sent:* Wednesday, April 27, 2022 11:06 AM
>> *To:* Snyder, Shane <ssnyder at mcs.anl.gov>
>> *Cc:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
>> *Subject:* Re: [Darshan-users] Module contains incomplete data
>>
>> Hi,
>>
>> yes, that NAMEMEM got it done. I also excluded .py and .pyc files - those
>> reads are only loading of them, right? No data access itself (and no, I'm
>> not reading and manually interpreting my own python files :), so I'm not
>> interested into that ). Actually, I'm reading thousands of small files
>> which I'm ingesting into HDF5 and I'm interested into how many reads, etc.
>> are happening.
>>
>> I'm trying to make some sense of what I see but for now I'm just going to
>> say it's very valuable data for me. Pity I can't get the Hdf5 module, if it
>> would give me more granularity it would be very helpful.
>>
>> Regarding the darshan-utils you were right, I didn't reinstall them. I
>> actually ran into an install problem - for some reason, the git
>> installation takes the .lib/darshan-parser when installing it to
>> /usr/local/... and that one throws:
>> darshan-parser: error while loading shared libraries:
>> libdarshan-util.so.0: cannot open shared object file: No such file or
>> directory
>>
>> But if I run the darshan-parser within darshan_root_folder/darshan-util/
>> then the error is gone and --show-incomplete |grep incomplete prints
>> nothing.
>>
>> Could we now focus on the HDF5 issue or should I create a new thread for
>> clarity?
>>
>> Cheers,
>>
>> Jiri
>>
>>
>>
>>
>>
>> st 27. 4. 2022 v 17:21 odesílatel Snyder, Shane <ssnyder at mcs.anl.gov>
>> napsal:
>>
>> Thanks for working through the build issues and giving this a shot.
>>
>> A couple of things stand out to me (ignoring your HDF5 issue for now):
>>
>> - It looks like at least the MPI-IO module is no longer reporting
>> partial data? Small progress...
>> - There is a new warning about there being no log utility handlers
>> for a "null" module. Are you perhaps parsing a log generated by your prior
>> Darshan install? Maybe you have not completely re-installed a new
>> darshan-util? We should figure out what's going on there, too, to be safe.
>>
>> I'd also suggest two things for your config file:
>>
>> - Dial back MODMEM and MAX_RECORDS values. Your MODMEM value is
>> asking Darshan to allocate a GiB of memory (it is expressed in MiB units
>> and you set to 1024), which Darshan will happily try to do, I'm not sure
>> it's a good idea though. I'd probably start with a MODMEM value of 8 and
>> MAX_RECORDS of 2000, and just double those again if needed -- anything
>> beyond that would be surprising unless you know your workload is really
>> opening hundreds of thousands of files. You might also have a look at the
>> files Darshan is currently instrumenting and see if you really want it to
>> -- I've noticed when instrumenting Python frameworks that you can get tons
>> of records for things like shared libraries, source files, etc. that can
>> just be ignored using NAME_EXCLUDE mechanisms.
>> - Add "NAMEMEM 2" to your config file to force Darshan to allocate
>> more memory (2 MiB) for storing the filenames associated with each record.
>> This might actually be the main reason your log is reporting partial data
>> rather than actually running out of module data, which is another reason
>> not to get too aggressive with the MODMEM/MAX_RECORDS parameters. I should
>> have mentioned this setting originally as there have been other users who
>> have reported exceeding it recently.
>>
>> Hopefully that gets you further along and we can move onto the HDF5 issue
>> you mention.
>>
>> Thanks,
>> --Shane
>> ------------------------------
>> *From:* Jiří Nádvorník <nadvornik.ji at gmail.com>
>> *Sent:* Wednesday, April 27, 2022 6:37 AM
>> *To:* Snyder, Shane <ssnyder at mcs.anl.gov>
>> *Cc:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
>> *Subject:* Re: [Darshan-users] Module contains incomplete data
>>
>> Aha! I just realized there is an obvious "prepare.sh" script that I
>> didn't run, I found out by trial and error though, could be more documented
>> :).
>>
>> Now I'm further.. With a config file:
>> MAX_RECORDS 102400 POSIX,MPI-IO,STDIO
>> MODMEM 1024
>> APP_EXCLUDE git,ls
>>
>> I'm getting for:
>> darshan-parser --show-incomplete
>> caucau_python_id127447-127447_4-27-48556-1842455298968263838_1.darshan
>> |grep incomplete
>>
>> output:
>> # *WARNING*: The POSIX module contains incomplete data!
>> # *WARNING*: The STDIO module contains incomplete data!
>> Warning: no log utility handlers defined for module (null), SKIPPING.
>>
>> I don't think I have more than 100000 files to be touched by my poor tiny
>> python script, right?
>>
>> By the way I've encountered another problem, not sure whether to put it
>> to another thread. If I compile with HDF5 (the results above are without
>> it):
>> ./configure --with-log-path=/gpfs/raid/darshan-logs
>> --with-jobid-env=PBS_JOBID CC=mpicc --enable-hdf5-mod
>> --with-hdf5=/gpfs/raid/SDSSCube/ext_lib//hdf5-1.12.0/hdf5/
>>
>> It messes up my runtime and causes python to crash:
>> mpirun -x DARSHAN_CONFIG_PATH=/gpfs/raid/SDSSCube/darshan.conf -x
>> LD_PRELOAD=/gpfs/raid/shared_libs/darshan/darshan-runtime/lib/.libs/libdarshan.so:/gpfs/raid/SDSSCube/ext_lib/hdf5-1.12.0/hdf5/lib/libhdf5.so
>> -np 65 --hostfile hosts --map-by node
>> /gpfs/raid/SDSSCube/venv_par/bin/python hisscube.py --truncate
>> ../sdss_data/ results/SDSS_cube_c_par.h5
>>
>> Resulting in:
>> INFO:rank[0]:Rank 0 pid: 137058
>> Darshan HDF5 module error: runtime library version (1.12) incompatible
>> with Darshan module (1.10-).
>> Traceback (most recent call last):
>> File "hisscube.py", line 74, in <module>
>> writer.ingest(fits_image_path, fits_spectra_path,
>> truncate_file=args.truncate)
>> File "/gpfs/raid/SDSSCube/hisscube/ParallelWriterMWMR.py", line 45, in
>> ingest
>> self.process_metadata(image_path, image_pattern, spectra_path,
>> spectra_pattern, truncate_file)
>> File "/gpfs/raid/SDSSCube/hisscube/CWriter.py", line 150, in
>> process_metadata
>> h5_file = self.open_h5_file_serial(truncate_file)
>> File "/gpfs/raid/SDSSCube/hisscube/CWriter.py", line 170, in
>> open_h5_file_serial
>> return h5py.File(self.h5_path, 'w', fs_strategy="page",
>> fs_page_size=4096, libver="latest")
>> File
>> "/gpfs/raid/SDSSCube/venv_par/lib/python3.8/site-packages/h5py-3.6.0-py3.8-linux-x86_64.egg/h5py/_hl/files.py",
>> line 533, in __init__
>> fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
>> File
>> "/gpfs/raid/SDSSCube/venv_par/lib/python3.8/site-packages/h5py-3.6.0-py3.8-linux-x86_64.egg/h5py/_hl/files.py",
>> line 232, in make_fid
>> fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
>> File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
>> File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
>> File "h5py/h5f.pyx", line 126, in h5py.h5f.create
>> File "h5py/defs.pyx", line 693, in h5py.defs.H5Fcreate
>> RuntimeError: Unspecified error in H5Fcreate (return value <0)
>>
>> You are saying that darshan should be compatible with HDF5 > 1.8, which
>> 1.12 should be, right?
>>
>> Thanks for help!
>>
>> Cheers,
>>
>> Jiri
>>
>>
>>
>>
>>
>>
>> st 27. 4. 2022 v 8:43 odesílatel Jiří Nádvorník <nadvornik.ji at gmail.com>
>> napsal:
>>
>> Hi,
>>
>> I think I will chew through the documentation just fine but two things
>> are not clear:
>>
>> 1. Does the darshan library provide its own config file that I need
>> to change or do I need to always create my own?
>> 2. How can I build the git version? I didn't find any instructions
>> and the usual autoconf just throws:
>> 1. root at kub-b1:/gpfs/raid/shared_libs/darshan/darshan-runtime#
>> autoconf
>> configure.ac:19: error: possibly undefined macro:
>> AC_CONFIG_MACRO_DIRS
>> If this token and others are legitimate, please use
>> m4_pattern_allow.
>> See the Autoconf documentation.
>> configure.ac:21: error: possibly undefined macro: AM_INIT_AUTOMAKE
>> configure.ac:22: error: possibly undefined macro: AM_SILENT_RULES
>> configure.ac:23: error: possibly undefined macro:
>> AM_MAINTAINER_MODE
>> configure.ac:713: error: possibly undefined macro: AM_CONDITIONAL
>> root at kub-b1:/gpfs/raid/shared_libs/darshan/darshan-runtime#
>> ./configure
>> configure: error: cannot find install-sh, install.sh, or shtool in
>> ../maint/scripts "."/../maint/scripts
>>
>> Thanks for help.
>>
>> Cheers,
>>
>> Jiri
>>
>> út 26. 4. 2022 v 17:43 odesílatel Snyder, Shane <ssnyder at mcs.anl.gov>
>> napsal:
>>
>> Hi Jiri,
>>
>> For some background, Darshan enforces some internal memory limits to
>> avoid ballooning memory usage at runtime. Specifically, all of our
>> instrumentation modules should pre-allocate file records for up to 1,024
>> files opened by the app -- if your app opens more than 1,024 files
>> per-process, Darshan stops instrumenting and issues those warning messages
>> when parsing the log file.
>>
>> We have users hit this issue pretty frequently now, and we actually just
>> wrapped up development of some new mechanisms to help out with this. They
>> were just merged into our main branch, and we will be formally releasing a
>> pre-release version of this code in the next week or so. For the time
>> being, you should be able to use the 'main' branch of our repo (
>> https://github.com/darshan-hpc/darshan) to leverage this new
>> functionality.
>>
>> There are 2 new mechanisms that can help out, both of which require you
>> to provide a configuration file to Darshan at runtime:
>>
>> - MAX_RECORDS setting can be used to bump up the number of
>> pre-allocated records for different modules. In your case, you might try to
>> bump up the default number of records for the POSIX, MPI-IO, and STDIO
>> modules by setting something like this in your config file (this would
>> allow you to instrument up to 4000 files per-process for each of these
>> modules):
>> - MAX_RECORDS 4000 POSIX,MPI-IO,STDIO
>> - An alternative (or complementary) approach to bumping up the record
>> limit is to limit instrumentation to particular files. You can use the
>> NAME_EXCLUDE setting to avoid instrumenting specific directory paths, file
>> extensions, etc by specifying regular expressions. E.g, the following
>> settings would avoid instrumenting files with .so prefixes or files located
>> in a directory we don't care about for all modules (* denotes all modules):
>> - NAME_EXCLUDE .so$ *
>> - NAME_EXCLUDE ^/path/to/avoid *
>>
>> I'm attaching the updated runtime documentation for Darshan for your
>> reference. Section 8 provides a ton of details on how to provide a config
>> file to Darshan that should help clear up any missing gaps in my
>> description above.
>>
>> Please let us know if you have any further questions or issues, though!
>>
>> Thanks,
>> --Shane
>> ------------------------------
>> *From:* Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on
>> behalf of Jiří Nádvorník <nadvornik.ji at gmail.com>
>> *Sent:* Sunday, April 24, 2022 3:00 PM
>> *To:* darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
>> *Subject:* [Darshan-users] Module contains incomplete data
>>
>> Hi All,
>>
>> I just tried out Darshan and the potential output seems perfect for my
>> HDF5 MPI application! Although I'm not able to get there :(.
>>
>> I have a log that has a big stamp "This darshan log contains incomplete
>> data".
>>
>> When I run:
>> darshan-parser --show-incomplete mylog.darshan |grep incomplete
>> Output is:
>> # *WARNING*: The POSIX module contains incomplete data!
>> # *WARNING*: The MPI-IO module contains incomplete data!
>> # *WARNING*: The STDIO module contains incomplete data!
>>
>> Would you be able to point me to some setting that would improve the
>> measurements? Can I actually rely on the profiling results if it says the
>> data is incomplete in some of the categories?
>>
>> Thank you very much for your help!
>>
>> Cheers,
>>
>> Jiri
>>
>>
> _______________________________________________
> Darshan-users mailing listDarshan-users at lists.mcs.anl.govhttps://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20220506/b568c539/attachment-0001.html>
More information about the Darshan-users
mailing list