[Swift-user] follow up of kickstart executable not found problem
Mihael Hategan
hategan at mcs.anl.gov
Fri Sep 7 23:41:58 CDT 2007
Nevermind that. It's eating the empty stdin argument. This doesn't make
sense. Fork used to behave.
Can you add this to log4j.properties, run again, and post the log?
log4j.logger.org.globus.cog.abstraction=DEBUG
On Fri, 2007-09-07 at 23:02 -0500, Mihael Hategan wrote:
> Can you post the workflow?
>
> On Fri, 2007-09-07 at 21:56 -0500, Jing Tie wrote:
> > Thanks!
> >
> > I checked the procedure, and found that the filenames of the output
> > files of "myapp" are the same as "File f". So the first option should
> > not be the problem.
> >
> > Then I tried a very simple swift script that doesn't need R:
> > app function: add some lines to the input file to generate the output file
> > input file name: simpleFile.txt
> > output file name: simpleFile.output
> > application script location: $OSG_APP/osg/jtie/duplicate.sh
> > jobmanager: jobmanager-fork
> >
> > duplicate failed
> > The following errors have occurred:
> > 1. Application "duplicate" failed (Failed to link input file
> > /dscratch/osg/app/osg/jtie/duplicate.sh)
> > Arguments: "simpleFile.txt"
> > Host: NWICG_NotreDame
> > Directory: simple-wf-xgh9e9q1z0af2/duplicate-it7hyvgi
> > STDERR:
> > STDOUT:
> >
> > after execution, 3 empty directories (not files, and also weird name)
> > were generated:
> > duplicate-gt7hyvgi-simpleFile.output
> > duplicate-ht7hyvgi-simpleFile.output
> > duplicate-it7hyvgi-simpleFile.output
> >
> > wrapper.log:
> > DIR=duplicate-gt7hyvgi
> > STDOUT=simpleFile.output
> > STDERR=stderr.txt
> > DIRS=simpleFile.output
> > LINKS=/dscratch/osg/app/osg/jtie/duplicate.sh
> > OUTS=simpleFile.txt
> > ln: creating symbolic link
> > `duplicate-gt7hyvgi//dscratch/osg/app/osg/jtie/duplicate.sh' to
> > `/dscratch/osg/data/osg/jtie/simple-wf-xgh9e9q1z0af2/shared//dscratch/osg/app/osg/jtie/duplicate.sh':
> > No such file or directory
> >
> > under dir /dscratch/osg/data/osg/jtie/simple-wf-xgh9e9q1z0af2/shared:
> > seq.sh, simpleFile.txt, wrapper.sh
> > under dir /dscratch/osg/data/osg/jtie/simple-wf-xgh9e9q1z0af2/duplicate-it7hyvgi:
> > one directory - simpleFile.output
> > /dscratch/osg/data/osg/jtie/simple-wf-xgh9e9q1z0af2/duplicate-it7hyvgi/simpleFile.output:
> > empty
> >
> > I think the swift program is right, since I run it successfully using
> > localhost duplicate.sh.
> >
> >
> > Many thanks,
> > Jing
> >
> >
> > On 9/7/07, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > So when the output files are not created, there can be two reasons:
> > > 1. The specification of what files should be created is broken. This is,
> > > at this time, done by looking at the filenames of the return values from
> > > the atomic procedure. Normally one passes those file names to the
> > > application as output file parameters. Example:
> > >
> > > (File f) proc(...) {
> > > app {
> > > myapp ... "-o" @filename(f);
> > > }
> > > }
> > >
> > > 2. The specification is correct, but the application doesn't behave.
> > >
> > > Mihael
> > >
> > > On Fri, 2007-09-07 at 12:35 -0500, Jing Tie wrote:
> > > > Hi,
> > > >
> > > > I tried jobmanger-fork instead of jobmanager-condor on osg.hpcc.nd.edu site:
> > > > <recall the site used to have the exception - kickstart executable
> > > > (101-FBchannel18_cwt-avgResults.Rdata) not found>
> > > >
> > > > jobmanager-fork:
> > > > ------------------------
> > > > Application exception: The following output files were not created by
> > > > the application: /dscratch/osg/app/osg/jtie/SIDGrid/wavelet.sh
> > > >
> > > > globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> > > > /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/cwtsmall-0bhnnvgi
> > > > lrwxrwxrwx 1 osg osgusers 93 Sep 7 12:42
> > > > 101-FBchannel10_cwt-avgResults.Rdata ->
> > > > /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/shared/101-FBchannel10_cwt-avgResults.Rdata
> > > > ... (total 28 links, the same number as the number of the expected output files)
> > > > drwxr-xr-x 2 osg osgusers 4096 Sep 7 12:42 101_FB-epochs.Rdata
> > > > drwxr-xr-x 3 osg osgusers 4096 Sep 7 12:42 scripts
> > > > -rw-r--r-- 1 osg osgusers 58 Sep 7 12:42 stderr.txt
> > > >
> > > > globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> > > > /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/shared
> > > > -rw-r--r-- 1 osg osgusers 4109752 Sep 7 12:41 101_FB-epochs.Rdata
> > > > drwxr-xr-x 2 osg osgusers 4096 Sep 7 12:41 scripts
> > > > -rw-r--r-- 1 osg osgusers 571 Sep 7 12:41 seq.sh
> > > > -rw-r--r-- 1 osg osgusers 3278 Sep 7 12:41 wrapper.sh
> > > >
> > > > empty kickstart directory
> > > >
> > > > globus-job-run osg.hpcc.nd.edu/jobmanager /bin/cat
> > > > /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/status/cwtsmall-0bhnnvgi-error
> > > > The following output files were not created by the application:
> > > > /dscratch/osg/app/osg/jtie/SIDGrid/wavelet.sh
> > > >
> > > >
> > > > jobmanager-condor:
> > > > -------------------------------
> > > > Application exception: The following output files were not created by
> > > > the application: 101-FBchannel20_cwt-avgResults.Rdata
> > > >
> > > > globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> > > > /dscratch/osg/data/osg/jtie/sid-wf1-3my5pn01t3ov0/cwtsmall-cpteovgi
> > > > lrwxrwxrwx 1 osg osgusers 76 Sep 7 13:01 101_FB-epochs.Rdata ->
> > > > /dscratch/osg/data/osg/jtie/sid-wf1-3my5pn01t3ov0/shared/101_FB-epochs.Rdata
> > > > drwxr-xr-x 3 osg osgusers 4096 Sep 7 13:01 scripts
> > > > -rw-r--r-- 1 osg osgusers 70 Sep 7 13:01 stderr.txt
> > > >
> > > > globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> > > > /dscratch/osg/data/osg/jtie/sid-wf1-3my5pn01t3ov0/shared
> > > > -rw-r--r-- 1 osg osgusers 4109752 Sep 7 13:00 101_FB-epochs.Rdata
> > > > drwxr-xr-x 2 osg osgusers 4096 Sep 7 13:00 scripts
> > > > -rw-r--r-- 1 osg osgusers 571 Sep 7 13:00 seq.sh
> > > > -rw-r--r-- 1 osg osgusers 3278 Sep 7 13:00 wrapper.sh
> > > >
> > > > I think the descriptions of exception are all right now. The
> > > > difference between fork and condor was that fork created the output
> > > > links to the shared directory, but condor didn't. But the essential
> > > > problem is the output files not being created. I will do more
> > > > experiments to see whether the problem of file system or application.
> > > >
> > > > Thanks,
> > > > Jing
> > > > _______________________________________________
> > > > Swift-user mailing list
> > > > Swift-user at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > > >
> > >
> > >
> >
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
More information about the Swift-user
mailing list