[Swift-user] follow up of kickstart executable not found problem

Mihael Hategan hategan at mcs.anl.gov
Fri Sep 7 12:40:32 CDT 2007


So when the output files are not created, there can be two reasons:
1. The specification of what files should be created is broken. This is,
at this time, done by looking at the filenames of the return values from
the atomic procedure. Normally one passes those file names to the
application as output file parameters. Example:

(File f) proc(...) {
  app {
    myapp ... "-o" @filename(f);
  }
}

2. The specification is correct, but the application doesn't behave.

Mihael

On Fri, 2007-09-07 at 12:35 -0500, Jing Tie wrote:
> Hi,
> 
> I tried jobmanger-fork instead of jobmanager-condor on osg.hpcc.nd.edu site:
> <recall the site used to have the exception - kickstart executable
> (101-FBchannel18_cwt-avgResults.Rdata) not found>
> 
> jobmanager-fork:
> ------------------------
> Application exception: The following output files were not created by
> the application: /dscratch/osg/app/osg/jtie/SIDGrid/wavelet.sh
> 
> globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/cwtsmall-0bhnnvgi
> lrwxrwxrwx  1 osg osgusers   93 Sep  7 12:42
> 101-FBchannel10_cwt-avgResults.Rdata ->
> /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/shared/101-FBchannel10_cwt-avgResults.Rdata
> ... (total 28 links, the same number as the number of the expected output files)
> drwxr-xr-x  2 osg osgusers 4096 Sep  7 12:42 101_FB-epochs.Rdata
> drwxr-xr-x  3 osg osgusers 4096 Sep  7 12:42 scripts
> -rw-r--r--  1 osg osgusers   58 Sep  7 12:42 stderr.txt
> 
> globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/shared
> -rw-r--r--  1 osg osgusers 4109752 Sep  7 12:41 101_FB-epochs.Rdata
> drwxr-xr-x  2 osg osgusers    4096 Sep  7 12:41 scripts
> -rw-r--r--  1 osg osgusers     571 Sep  7 12:41 seq.sh
> -rw-r--r--  1 osg osgusers    3278 Sep  7 12:41 wrapper.sh
> 
> empty kickstart directory
> 
> globus-job-run osg.hpcc.nd.edu/jobmanager /bin/cat
> /dscratch/osg/data/osg/jtie/sid-wf1-2g48a936yixu1/status/cwtsmall-0bhnnvgi-error
> The following output files were not created by the application:
> /dscratch/osg/app/osg/jtie/SIDGrid/wavelet.sh
> 
> 
> jobmanager-condor:
> -------------------------------
> Application exception: The following output files were not created by
> the application: 101-FBchannel20_cwt-avgResults.Rdata
> 
> globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> /dscratch/osg/data/osg/jtie/sid-wf1-3my5pn01t3ov0/cwtsmall-cpteovgi
> lrwxrwxrwx  1 osg osgusers   76 Sep  7 13:01 101_FB-epochs.Rdata ->
> /dscratch/osg/data/osg/jtie/sid-wf1-3my5pn01t3ov0/shared/101_FB-epochs.Rdata
> drwxr-xr-x  3 osg osgusers 4096 Sep  7 13:01 scripts
> -rw-r--r--  1 osg osgusers   70 Sep  7 13:01 stderr.txt
> 
> globus-job-run osg.hpcc.nd.edu/jobmanager /bin/ls -al
> /dscratch/osg/data/osg/jtie/sid-wf1-3my5pn01t3ov0/shared
> -rw-r--r--  1 osg osgusers 4109752 Sep  7 13:00 101_FB-epochs.Rdata
> drwxr-xr-x  2 osg osgusers    4096 Sep  7 13:00 scripts
> -rw-r--r--  1 osg osgusers     571 Sep  7 13:00 seq.sh
> -rw-r--r--  1 osg osgusers    3278 Sep  7 13:00 wrapper.sh
> 
> I think the descriptions of exception are all right now. The
> difference between fork and condor was that fork created the output
> links to the shared directory, but condor didn't. But the essential
> problem is the output files not being created. I will do more
> experiments to see whether the problem of file system or application.
> 
> Thanks,
> Jing
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> 




More information about the Swift-user mailing list