[Swift-user] Success with fork, but exception in getFile with condor

Mihael Hategan hategan at mcs.anl.gov
Tue Sep 11 09:42:26 CDT 2007


On Tue, 2007-09-11 at 00:09 -0500, Jing Tie wrote:
> Hi,
> 
> Thanks! Is it possible that the status file was generated in an
> unexpected directory?

Very unlikely.

> 
> I run SID application on another site atlas.dpcc.uta.edu
> (jobmanager-pbs), and it succeed! But on site u2-grid.ccr.buffalo.edu
> (jobmanager-pbs), there was an execution error after task submitting:
> "FileResourceCache Maximum idle time exceeded. Removing resource for
> gsiftp://u2-grid.ccr.buffalo.edu".

That's not an error. Idle GridFTP connections are removed from the cache
after a while. Your log shows simply that nothing is happening. 

Mihael

>  logs are attached (sid*.log ---
> u2-grid.ccr.buffalo.edu, simple*.log --- cmsgrid01.hep.wisc.edu).
> 
> Thanks,
> Jing
> 
> On 9/10/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > The wrapper produces exactly one status file: <jobid>-success or
> > <jobid>-error. If none is present it means that either the very unlikely
> > thing that the wrapper didn't write any of them, due to some weird thing
> > I'm missing, or that GridFTP on the head node doesn't see what the
> > wrapper has written.
> >
> > On Mon, 2007-09-10 at 16:42 -0500, Jing Tie wrote:
> > > Hi,
> > >
> > > I think there is a problem running swift script with jobmanager-condor
> > > on some OSG sites. I run simple-wf.dtm (very simple swift script to
> > > copy content of input file to output file) and SID script on GLOW site
> > > separately. Everything is great when running by jobmanager-fork, but
> > > "exception in getFile" happened with jobmanager-condor. The log from
> > > swift client is attached. However, no log/info/output files were
> > > generated in the swift work cache, neither was any duplicate-***
> > > directory, though in the log file the directory seemed had been
> > > created.
> > >
> > > The site GLOW (cmsgrid01.hep.wisc.edu) can successfully run
> > > globus-url-copy, copy files between OSG_DATA and OSG_WN_TMP.
> > >
> > > Exception:
> > > Task(type=2, identity=urn:0-0-1189455037519) setting status to Failed
> > > Exception in getFile
> > > File transfer failed
> > > duplicate failed
> > > The following errors have occurred:
> > > 1. Application "duplicate" failed (No status file was found. Check the
> > > shared filesystem on GLOW)
> > >         Arguments: "simpleFile.txt"
> > >         Host: GLOW
> > >         Directory: simple-wf-7l8vqstrkud90/duplicate-7niqt1hi
> > >         STDERR:
> > >         STDOUT:
> > >
> > > Thanks,
> > > Jing
> > > _______________________________________________
> > > Swift-user mailing list
> > > Swift-user at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >
> >




More information about the Swift-user mailing list