[Swift-user] Success with fork, but exception in getFile with condor

Anand Padmanabhan anand-padmanabhan-1 at uiowa.edu
Fri Sep 14 14:08:42 CDT 2007


> 
>>> After the job is done, Swift, from the comfort of the submit host,
>>> checks, through GridFTP, first whether the success file is there, and if
>>> not whether the error file is there. It finds none, which means that
>>> these files, although presumably written by the wrapper on the worker
>>> node, cannot be seen on the head node through GridFTP.
>>>
>>> So it looks to me like there might be something wrong with the file
>>> system?
>> Is there some logs that the Swift/application write on the server side, 
>> that might record if it had some problem writing these output/error 
>> files.
> 
> Yes. Jing can help you with finding these. Basically they are
> <workflow-id>/info/<job-id>-info
Jing Could you send me this file? Also is it possible to add logging 
statements to the application, so that if needed we can find more 
information from this file.
> 
>>  Also I know some condor systems, job executables get dumped a 
>> temporary directory on a worker node's local file system. Would this 
>> have any effect on Swift?
> 
> As long as Condor/the job manager honor the directory rls setting, this
> shouldn't make any difference.
This is something we need to make sure this is the case. I know we had a 
earlier problem at FNAL_FERMIGRID on which the initial dir globus 
parameter was not respected. You can find details at 
https://twiki.grid.iu.edu/twiki/bin/view/Troubleshooting/NewUserRunningJobsFailureFNAL

Is there a way to get the RSL that gets submitted to the site from Swift.

Thanks
Anand



More information about the Swift-user mailing list