[Swift-user] follow up of kickstart executable not found problem

Jing Tie tiejing at gmail.com
Fri Sep 7 21:56:31 CDT 2007


Thanks!

I checked the procedure, and found that the filenames of the output
files of "myapp" are the same as "File f". So the first option should
not be the problem.

Then I tried a very simple swift script that doesn't need R:
app function: add some lines to the input file to generate the output file
input file name: simpleFile.txt
output file name: simpleFile.output
application script location: $OSG_APP/osg/jtie/duplicate.sh
jobmanager: jobmanager-fork

duplicate failed
The following errors have occurred:
1. Application "duplicate" failed (Failed to link input file
/dscratch/osg/app/osg/jtie/duplicate.sh)
        Arguments: "simpleFile.txt"
        Host: NWICG_NotreDame
        Directory: simple-wf-xgh9e9q1z0af2/duplicate-it7hyvgi
        STDERR:
        STDOUT:

after execution, 3 empty directories (not files, and also weird name)
were generated:
duplicate-gt7hyvgi-simpleFile.output
duplicate-ht7hyvgi-simpleFile.output
duplicate-it7hyvgi-simpleFile.output

wrapper.log:
DIR=duplicate-gt7hyvgi
STDOUT=simpleFile.output
STDERR=stderr.txt
DIRS=simpleFile.output
LINKS=/dscratch/osg/app/osg/jtie/duplicate.sh
OUTS=simpleFile.txt
ln: creating symbolic link
`duplicate-gt7hyvgi//dscratch/osg/app/osg/jtie/duplicate.sh' to
`/dscratch/osg/data/osg/jtie/simple-wf-xgh9e9q1z0af2/shared//dscratch/osg/app/osg/jtie/duplicate.sh':
No such file or directory

under dir /dscratch/osg/data/osg/jtie/simple-wf-xgh9e9q1z0af2/shared:
seq.sh, simpleFile.txt, wrapper.sh
under dir /dscratch/osg/data/osg/jtie/simple-wf-xgh9e9q1z0af2/duplicate-it7hyvgi:
one directory - simpleFile.output
/dscratch/osg/data/osg/jtie/simple-wf-xgh9e9q1z0af2/duplicate-it7hyvgi/simpleFile.output:
empty

I think the swift program is right, since I run it successfully using
localhost duplicate.sh.


Many thanks,
Jing


On 9/7/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> So when the output files are not created, there can be two reasons:
> 1. The specification of what files should be created is broken. This is,
> at this time, done by looking at the filenames of the return values from
> the atomic procedure. Normally one passes those file names to the
> application as output file parameters. Example:
>
> (File f) proc(...) {
>   app {
>     myapp ... "-o" @filename(f);
>   }
> }
>
> 2. The specification is correct, but the application doesn't behave.
>
> Mihael
>
> On Fri, 2007-09-07 at 12:35 -0500, Jing Tie wrote:
> > Hi,
> >
> > I tried jobmanger-fork instead of jobmanager-condor on osg.hpcc.nd.edu site:
> > <recall the site used to have the exception - kickstart executable
> > (101-FBchannel18_cwt-avgResults.Rdata) not found>
> >
> > jobmanager-fork:
> > ------------------------
> > Application exception: The following output files were not created by
> > the application: /dscratch/osg/app/osg/jtie/SIDGrid/wavelet.sh
> >
> > globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> > /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/cwtsmall-0bhnnvgi
> > lrwxrwxrwx  1 osg osgusers   93 Sep  7 12:42
> > 101-FBchannel10_cwt-avgResults.Rdata ->
> > /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/shared/101-FBchannel10_cwt-avgResults.Rdata
> > ... (total 28 links, the same number as the number of the expected output files)
> > drwxr-xr-x  2 osg osgusers 4096 Sep  7 12:42 101_FB-epochs.Rdata
> > drwxr-xr-x  3 osg osgusers 4096 Sep  7 12:42 scripts
> > -rw-r--r--  1 osg osgusers   58 Sep  7 12:42 stderr.txt
> >
> > globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> > /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/shared
> > -rw-r--r--  1 osg osgusers 4109752 Sep  7 12:41 101_FB-epochs.Rdata
> > drwxr-xr-x  2 osg osgusers    4096 Sep  7 12:41 scripts
> > -rw-r--r--  1 osg osgusers     571 Sep  7 12:41 seq.sh
> > -rw-r--r--  1 osg osgusers    3278 Sep  7 12:41 wrapper.sh
> >
> > empty kickstart directory
> >
> > globus-job-run osg.hpcc.nd.edu/jobmanager /bin/cat
> > /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/status/cwtsmall-0bhnnvgi-error
> > The following output files were not created by the application:
> > /dscratch/osg/app/osg/jtie/SIDGrid/wavelet.sh
> >
> >
> > jobmanager-condor:
> > -------------------------------
> > Application exception: The following output files were not created by
> > the application: 101-FBchannel20_cwt-avgResults.Rdata
> >
> > globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> > /dscratch/osg/data/osg/jtie/sid-wf1-3my5pn01t3ov0/cwtsmall-cpteovgi
> > lrwxrwxrwx  1 osg osgusers   76 Sep  7 13:01 101_FB-epochs.Rdata ->
> > /dscratch/osg/data/osg/jtie/sid-wf1-3my5pn01t3ov0/shared/101_FB-epochs.Rdata
> > drwxr-xr-x  3 osg osgusers 4096 Sep  7 13:01 scripts
> > -rw-r--r--  1 osg osgusers   70 Sep  7 13:01 stderr.txt
> >
> > globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> > /dscratch/osg/data/osg/jtie/sid-wf1-3my5pn01t3ov0/shared
> > -rw-r--r--  1 osg osgusers 4109752 Sep  7 13:00 101_FB-epochs.Rdata
> > drwxr-xr-x  2 osg osgusers    4096 Sep  7 13:00 scripts
> > -rw-r--r--  1 osg osgusers     571 Sep  7 13:00 seq.sh
> > -rw-r--r--  1 osg osgusers    3278 Sep  7 13:00 wrapper.sh
> >
> > I think the descriptions of exception are all right now. The
> > difference between fork and condor was that fork created the output
> > links to the shared directory, but condor didn't. But the essential
> > problem is the output files not being created. I will do more
> > experiments to see whether the problem of file system or application.
> >
> > Thanks,
> > Jing
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >
>
>



More information about the Swift-user mailing list