[Darshan-users] Darshan unable to write log files

Phil Carns carns at mcs.anl.gov
Thu Feb 26 16:19:04 CST 2015


On 02/26/2015 04:54 PM, Latham, Robert J. wrote:
>
> On 02/26/2015 03:16 PM, Gunter, David O wrote:
>> Good call, Rob. Thanks!
>>
>> Saving to a non-Panasas directory worked, as did writing to Panasas with the ufs: prefix.
> Ok, super strange.  When darshan writes the log files, it does it
> through MPI-IO, so if one can write via MPI-IO to your panasass file
> system, why can't darshan?

The only unusual thing that I can think of in Darshan's log-writing path 
is that it sets a few MPI-IO hints.  Specifically, it uses this by 
default:  "romio_no_indep_rw=true;cb_nodes=4".  That triggers the use of 
deferred opens and limits the number of aggregators to 4.

You could try turning those hints off by setting the 
CP_LOG_HINTS_OVERRIDE="" environment variable before running your 
application (and of course aiming the logs at your Panasas volume 
without using a ufs: prefix).

Those hints should be harmless in theory, but it might be worth a shot.

thanks,
-Phil

> the error "MPI_ERR_IO: input/output error" is not very helpful.  It's
> the error you get if it was not ENAMETOOLONG, ENOENT, ENOTDIR, ELOOP,
> EACCES, EROFS.
>
> Last summer I commited some code to ROMIO to also catch EDQUOT, ENOSPC,
> and EEXIST.
>
> But even in the "all other cases" case, ROMIO's supposed to call
> strerror() and give you something -- anything! -- more helpful than "Uh,
> some IO error happened"
>
> What MPI implementation are you using?
>
> ==rob
>
>
>
>> -david
>> --
>> David Gunter
>> HPC-5: Applications Readiness Team
>>
>>
>>
>>
>>> On Feb 26, 2015, at 2:04 PM, Rob Latham <robl at mcs.anl.gov> wrote:
>>>
>>>
>>>
>>> On 02/26/2015 02:54 PM, Gunter, David O wrote:
>>>
>>>> My app writes to a Panasas file system, /scratch/dog/test_prob and I have set
>>>> DARSHAN_LOGPATH to /scratch/dog/darshan_logs/
>>> this bit, about Panasas, is the only thing that looks out of the ordinary to me.
>>>
>>> Can you try a non-panasas file system?  If not, can you try prefixing the file with ufs: (DARSHAN_LOGPATH=ufs:/scratch/dog/darshan_logs/
>>>
>>>
>>> Did you set up the year/month/day directories?
>>> (darshan-runtime/darshan-mk-log-dirs.pl )
>>>
>>> ==rob
>>>
>>>> The permissions on the directory are good. My mpi job runs to completion and then I get the error message.
>>>>
>>>> $ mpirun -n 16 ./higrad_driver_noCBE ./params.16pe.in
>>>>
>>>> Total time for simulation = 69.468311
>>>> HiGrad simulation is complete!!
>>>> Shutting down MPI environment!
>>>> darshan library warning: unable to open log file /scratch/dog/darshan_logs/dog_higrad_driver_noCBE_id501442_2-26-49816-14771330631875741401.darshan_partial: MPI_ERR_IO: input/output error
>>>> darshan library warning: unable to write log file /scratch/dog/darshan_logs/dog_higrad_driver_noCBE_id501442_2-26-49816-14771330631875741401.darshan_partial
>>>>
>>>> --
>>>> David Gunter
>>>> HPC-5: Applications Readiness Team
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Darshan-users mailing list
>>>> Darshan-users at lists.mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>
>>> --
>>> Rob Latham
>>> Mathematics and Computer Science Division
>>> Argonne National Lab, IL USA
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>



More information about the Darshan-users mailing list