[Swift-user] Success with fork, but exception in getFile with condor

Anand Padmanabhan anand-padmanabhan-1 at uiowa.edu
Wed Sep 12 14:44:24 CDT 2007


Hi Michael,

The OSG troubleshooting team has been working with Jing to identify and 
correct the problems she is having when running on the OSG infrastructure.

Looking at the logs Jing sent us on one of the OSG site (possibly few 
more) she is getting the following information in the log:
...
2007-09-10 15:46:37,446 DEBUG vdl:execute2 Application exception: No 
status file was found. Check the shared filesystem on GLOW
...
2007-09-10 15:46:37,498 DEBUG DelegatedFileTransferHandler File transfer 
with resource remote->tmp
2007-09-10 15:46:37,730 DEBUG DelegatedFileTransferHandler Exception in 
transfer
org.globus.cog.abstraction.impl.file.FileResourceException: Exception in 
getFile
...

I don't think I have a clear understanding of what this error means. 
Does this mean that there was an application error because it did not 
find the files it was expecting or do you think this some problem 
related with to the OSG infrastructure. If so, could you tell me what 
exactly swift was trying to do in at these steps when it failed.

Thanks,
Anand

Mihael Hategan wrote:
> On Tue, 2007-09-11 at 00:09 -0500, Jing Tie wrote:
>> Hi,
>>
>> Thanks! Is it possible that the status file was generated in an
>> unexpected directory?
> 
> Very unlikely.
>  DelegatedFileTransferHandler Exception in transfer
org.globus.cog.abstraction.impl.file.FileResourceException: Exception in 
getFile
>> I run SID application on another site atlas.dpcc.uta.edu
>> (jobmanager-pbs), and it succeed! But on site u2-grid.ccr.buffalo.edu
>> (jobmanager-pbs), there was an execution error after task submitting:
>> "FileResourceCache Maximum idle time exceeded. Removing resource for
>> gsiftp://u2-grid.ccr.buffalo.edu".
> 
> That's not an error. Idle GridFTP connections are removed from the cache
> after a while. Your log shows simply that nothing is happening. 
> 
> Mihael
> 
>>  logs are attached (sid*.log ---
>> u2-grid.ccr.buffalo.edu, simple*.log --- cmsgrid01.hep.wisc.edu).
>>
>> Thanks,
>> Jing
>>
>> On 9/10/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
>>> The wrapper produces exactly one status file: <jobid>-success or
>>> <jobid>-error. If none is present it means that either the very unlikely
>>> thing that the wrapper didn't write any of them, due to some weird thing
>>> I'm missing, or that GridFTP on the head node doesn't see what the
>>> wrapper has written.
>>>
>>> On Mon, 2007-09-10 at 16:42 -0500, Jing Tie wrote:
>>>> Hi,
>>>>
>>>> I think there is a problem running swift script with jobmanager-condor
>>>> on some OSG sites. I run simple-wf.dtm (very simple swift script to
>>>> copy content of input file to output file) and SID script on GLOW site
>>>> separately. Everything is great when running by jobmanager-fork, but
>>>> "exception in getFile" happened with jobmanager-condor. The log from
>>>> swift client is attached. However, no log/info/output files were
>>>> generated in the swift work cache, neither was any duplicate-***
>>>> directory, though in the log file the directory seemed had been
>>>> created.
>>>>
>>>> The site GLOW (cmsgrid01.hep.wisc.edu) can successfully run
>>>> globus-url-copy, copy files between OSG_DATA and OSG_WN_TMP.
>>>>
>>>> Exception:
>>>> Task(type=2, identity=urn:0-0-1189455037519) setting status to Failed
>>>> Exception in getFile
>>>> File transfer failed
>>>> duplicate failed
>>>> The following errors have occurred:
>>>> 1. Application "duplicate" failed (No status file was found. Check the
>>>> shared filesystem on GLOW)
>>>>         Arguments: "simpleFile.txt"
>>>>         Host: GLOW
>>>>         Directory: simple-wf-7l8vqstrkud90/duplicate-7niqt1hi
>>>>         STDERR:
>>>>         STDOUT:
>>>>
>>>> Thanks,
>>>> Jing
>>>> _______________________________________________
>>>> Swift-user mailing list
>>>> Swift-user at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>
> 



More information about the Swift-user mailing list